History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: RHQ-834
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: John Mazzitelli
Reporter: Joseph Marques
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RHQ Project

alert cache not reload after agent reconnection

Created: 13/Sep/08 01:07 AM   Updated: 18/Sep/08 12:20 PM
Component/s: FX - Alerts, Core Server, High Availability
Affects Version/s: 1.1pre
Fix Version/s: 1.1

Time Tracking:
Not Specified

Issue Links:
Incorporate
 

Resolution Date: 15/Sep/08 12:22 PM
Date of First Response: 15/Sep/08 09:09 AM
Tester: Jeff Weiss
VCS Revision: 1,462


 Description  « Hide
steps to reproduce:

* start up server
* start up agent with --clean
** --clean will force the agent to follow the flow for a newly registering agent, and the cache will get reloaded
* once you see the agent prompt, stop/exit the agent
* restart the agent WITHOUT --clean

expected result:

* you'll see the alert cache for your agentId get reloaded if you tail the server log

actual result:

* you don't see any mention of the alerts cache in the server log during this agent reconnect

i see the new code for calling out to connectAgent upon failover, but we also need that to happen upon agent startup as well. i...think...all we need to do is take the AgentMain logic inside of failoverToNewServer(RemoteCommunicator) - specifically the part that calls out to connectAgent and handles the result - and add that to if-block starting on line 945 (rev1447). but i'm sure it can't be that simple, and i'm probably missing something.

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Joseph Marques - 13/Sep/08 01:12 AM
rev1449 - this commit changes the alert cache loading from server-start time to agent-connection time;

Joseph Marques - 14/Sep/08 06:43 AM
rev1452 (mazz) - when the agent successfully registers, it should also tell the server it wants to "connect" to it.

Joseph Marques - 14/Sep/08 06:45 AM - edited
i'm still seeing the same issue running on rev1452 code. when i start the agent normally (without --clean), i do not see the connectAgent method being called. this NEEDS to be done BEFORE any queued commands get sent up to the server. if a queued command is sent before the server is ready, then the cache might not loaded yet, which means that one or many alerts should have fired but will be missed now.

John Mazzitelli - 15/Sep/08 09:09 AM
the fix I plan on implementing for RHQ-835 should address this one too

John Mazzitelli - 15/Sep/08 12:00 PM
svn rev1461

John Mazzitelli - 15/Sep/08 12:06 PM
this is fixed - we send connect agent command to server whenever we restart the agent. this, in addition to sending the connect-agent command after we register should get the agent to properly connect to the server under all conditions

John Mazzitelli - 15/Sep/08 01:04 PM
correction - svn rev1462

Jeff Weiss - 18/Sep/08 12:20 PM
rev1547 - now see this in the log after the last step of the procedure:

2008-09-18 13:18:12,156 INFO [org.rhq.enterprise.server.core.AgentManagerBean] Agent with name [dev16.qa.atl2.redhat.com] just went down
2008-09-18 13:18:24,843 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Caches for agent[id=500050]...
2008-09-18 13:18:24,843 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Resource Availability'
2008-09-18 13:18:24,843 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Resource Availability', list was size 0
2008-09-18 13:18:24,843 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Measurement Threshold'
2008-09-18 13:18:24,859 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Measurement Threshold', list was size 0
2008-09-18 13:18:24,859 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Measurement Baseline'
2008-09-18 13:18:24,859 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Measurement Baseline', list was size 0
2008-09-18 13:18:24,859 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Measurement Value Change'
2008-09-18 13:18:24,875 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Measurement Value Change', list was size 0
2008-09-18 13:18:24,875 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Measurement Trait'
2008-09-18 13:18:24,875 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Measurement Trait', list was size 0
2008-09-18 13:18:24,875 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Control Action'
2008-09-18 13:18:24,875 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Control Action', list was size 0
2008-09-18 13:18:24,890 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Log Event'
2008-09-18 13:18:24,890 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Log Event', list was size 0
2008-09-18 13:18:24,906 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loaded Alert Condition Caches for agent[id=500050]
2008-09-18 13:18:24,906 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] UnloadStats for agent[id=500050]: AlertConditionCacheStats[ created=0, updated=0, deleted=0, matched=0, age=63ms ]
2008-09-18 13:18:24,906 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] ReloadStats for agent[id=500050]: AlertConditionCacheStats[ created=0, updated=0, deleted=0, matched=0, age=63ms ]
2008-09-18 13:18:24,937 INFO [org.rhq.enterprise.server.core.CoreServerServiceImpl] Agent [dev16.qa.atl2.redhat.com] has connected to this server.
2008-09-18 13:18:25,875 INFO [org.rhq.enterprise.server.core.CoreServerServiceImpl] Got agent registration request for existing agent: dev16.qa.atl2.redhat.com[10.18.0.79:16163] - Will not regenerate a new token
2008-09-18 13:18:26,125 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Caches for agent[id=500050]...
2008-09-18 13:18:26,125 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Resource Availability'
2008-09-18 13:18:26,125 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Resource Availability', list was size 0
2008-09-18 13:18:26,125 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Measurement Threshold'
2008-09-18 13:18:26,125 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Measurement Threshold', list was size 0
2008-09-18 13:18:26,125 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Measurement Baseline'
2008-09-18 13:18:26,140 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Measurement Baseline', list was size 0
2008-09-18 13:18:26,140 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Measurement Value Change'
2008-09-18 13:18:26,140 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Measurement Value Change', list was size 0
2008-09-18 13:18:26,140 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Measurement Trait'
2008-09-18 13:18:26,156 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Measurement Trait', list was size 0
2008-09-18 13:18:26,156 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Control Action'
2008-09-18 13:18:26,156 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Control Action', list was size 0
2008-09-18 13:18:26,156 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loading Alert Condition Composites of type 'Log Event'
2008-09-18 13:18:26,172 INFO [org.rhq.enterprise.server.alert.AlertConditionManagerBean] Found 0 elements of type 'Log Event', list was size 0
2008-09-18 13:18:26,172 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] Loaded Alert Condition Caches for agent[id=500050]
2008-09-18 13:18:26,172 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] UnloadStats for agent[id=500050]: AlertConditionCacheStats[ created=0, updated=0, deleted=0, matched=0, age=47ms ]
2008-09-18 13:18:26,172 INFO [org.rhq.enterprise.server.alert.engine.AlertConditionCache] ReloadStats for agent[id=500050]: AlertConditionCacheStats[ created=0, updated=0, deleted=0, matched=0, age=47ms ]
2008-09-18 13:18:26,187 INFO [org.rhq.enterprise.server.core.CoreServerServiceImpl] Agent [dev16.qa.atl2.redhat.com] has connected to this server.