History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: RHQ-1098
Type: Sub-task Sub-task
Status: Accepted Accepted
Priority: Critical Critical
Assignee: John Mazzitelli
Reporter: John Mazzitelli
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RHQ Project
RHQ-1092

make availability report interval longer

Created: 10/Nov/08 11:00 AM   Updated: 12/Nov/08 01:21 AM
Component/s: Agent, Core Server
Affects Version/s: None
Fix Version/s: 1.2

Time Tracking:
Not Specified


 Description  « Hide
Currently, the agent sends its availability reports every 60 seconds and the server expects to hear from the agent within 2 minutes.

I think we want to lengthen these times to something like 90 seconds and 4 minutes. Note the 90 seconds (on agent side) is configurable and we should be able to configure that 4 minutes on server side.

This change will a) cause less traffic to hit the server (in fact, we reduce the number of avail reports to be processed by 50%) and b) we only backfill agents when they have been silent for 4 minutes giving the agent more time to be able to get an avail report processed on the server side. Backfilling is expensive if the agent is UP so we only want to backfill when we are sure the agent is down.

Perhaps before we backfill, we should have the server try to ping the agent and if the ping succeeds, we shouldn't backfill. Just another test we could do to avoid backfilling when possible.

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
John Mazzitelli - 11/Nov/08 11:26 AM
we need to investigate what is the proper interval should be.

John Mazzitelli - 12/Nov/08 01:21 AM
An alternative is to perform some additional checking after 2 minutes of quiet time but before we actually backfill.

Perhaps we can look in our DB for ANY activity from the agent right before we backfill. If we've seen we already processed (within the past 2 minutes) an inventory report, a measurement report, an operation result, a configuration change or other agent-originating message, we can assume the agent is up and just hasn't been able to send us its avail report yet. In this case, we abort the backfill.

So its:

1) checkSuspectAgents looks for an avail report that occurred within the past 2 minutes. If nothing then:
2) check to see if the agent has sent us any message in the previous 2m interval (like inventory report, measurement report, operation result, etc). If we DID get such a message from the agent, abort and do not backfill. Otherwise:
3) continue with the normal backfill processing

So step 2) would be new.