History | Log In     View a printable version of the current page.  

jira.rhq-project.org has been archived and is now in read-only mode.

All issues have been moved to bugzilla.redhat.com. All new issues or updates to existing issues should be made through bugzilla.
Specific old RHQ issues can be found using a query of this form: https://bugzilla.redhat.com/show_bug.cgi?id=RHQ-1999.
New bugs can be raised here. Accounts at bugzilla.redhat.com have been created for existing RHQ users.
Open bugs raised in the last week can be found here.
Issue Details (XML | Word | Printable)

Key: RHQ-2174
Type: Improvement Improvement
Status: Accepted Accepted
Priority: Critical Critical
Assignee: Joseph Marques
Reporter: Joseph Marques
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RHQ Project

further reduce contention on agent tables

Created: 25/Jun/09 02:17 AM   Updated: 03/Sep/09 03:38 AM
Component/s: Agent, FX - Alerts, Performance, Core Server, High Availability
Fix Version/s: 1.3

Time Tracking:
Not Specified

Issue Links:
Relation
 

Date of First Response: 20/Jul/09 12:19 PM
Tester: Jeff Weiss
VCS Revision: 4,180


 Description  « Hide
some work has already bee done to remove application hot spots:

rev4160 - [RHQ-2124][RHQ-1656][RHQ-1221] - removed hot spots and various other points of contention by shortening transaction times or using indexes as available for: a) uninventory work, b) cloud manager job, c) check for suspect agent job, d) dynagroup recalculation job, e) alerts cache in-band agent and server status bit setting, f) isAgentBackfilled checking

my concern is that applying changes to many template, will trickle down to hundreds if not thousands of alert definitions, taxing the agent table a lot to set the status bit. so, StatusManagerBean should be rewritten to only set status if it's absolutely necessary, and to use a simple true/false bit semantic instead of the more complicated bit mask. as an optional way to aide in remote debugging, the classic bitmask strategy should be kept if running in debug mode so that it's easier to determine whether the backend alert cache consistent protocols are working as intended.

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Joseph Marques - 25/Jun/09 02:28 AM
rev4180 - remove hot spots when updating the agent status field by requiring only the FIRST thread to set the status field, all update statements thereafter thus do not need to acquire the update lock on the row;

to test:

* update some alert definition, wait up to 30 seconds, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
* update some alert tempate, wait up to 30 seconds, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
* manually calculate some measurement baseline, wait up to 30 seconds, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
* assuming templates are setup, commit some new resources, wait up to 30 seconds, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log
* let the hourly baseline job complete. assuming that even 1 baseline was calculated, make sure you see "reload caches info messages" and that no exceptions are thrown in the server log

Jeff Weiss - 20/Jul/09 12:19 PM
2009-07-20 12:22:27,002 INFO [org.rhq.enterprise.server.cloud.instance.CacheConsistencyManagerBean] jweiss-rhel1.usersys.redhat.com took [80]ms to reload global cache
2009-07-20 12:22:27,142 INFO [org.rhq.enterprise.server.cloud.instance.CacheConsistencyManagerBean] jweiss-rhel1.usersys.redhat.com took [140]ms to reload cache for 1 agents

I get the above when updating an alert. But it did not appear when manually calc'ing a baseline. I also tried setting the high/low range to a value I typed in. I got the 2nd line above when i updated the high. But nothing when i updated the low.

Templates have a regression currently so I will retest when the baseline problem is resolved.

rev4423