History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: RHQ-886
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Joseph Marques
Reporter: Joseph Marques
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
RHQ Project

group operations executed with "at the same time" option sometimes show "in progress" even when all res op children are complete

Created: 24/Sep/08 10:37 AM   Updated: 22/Dec/08 10:34 PM
Component/s: FX - Operations, Core Server
Affects Version/s: 1.1pre
Fix Version/s: 1.2

Time Tracking:
Not Specified

Resolution Date: 19/Oct/08 09:31 PM
Date of First Response: 22/Dec/08 10:34 PM
Tester: Corey Welton
VCS Revision: 1,807


 Description  « Hide
(11:27:50) mazz: hmmm... I just ran a group operation and all the resource ops are succesful, but the group op still says "in progress"
...
(11:30:46) joseph: did u execute serially or concurrently?
(11:31:05) joseph: i'm gonna take a shot in the dark and say concurrent
(11:31:28) mazz: concurrent
(11:31:36) joseph: hmm...yes
(11:31:41) joseph: this is more thread visibility issues
(11:31:54) joseph: you must have had the last two res ops completing at the same time
(11:32:06) joseph: they both check the db and find that there is at least one other res op not yet complete
(11:32:17) joseph: so neither of them set the group op to done
(11:32:43) mazz: yeah... well, we have that job running that should fix this eventually
(11:32:47) joseph: we should update the CheckForTimedOutOperationsJob
(11:32:53) joseph: today, i think it only checks for time outs
(11:32:59) joseph: doesn't check for logically complete
(11:33:02) mazz: oh
(11:33:33) joseph: i was 1/2 right
(11:34:10) joseph: it checks for res op timeouts, group op timeouts, and group op failures (due to any contained res op failure)
(11:34:15) joseph: doesn't check for successes

TODO - update CheckForTimedOutOperationsJob to check for all terminating conditions (i.e., all contained res op children are NOT "in progress")

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Joseph Marques - 24/Sep/08 10:50 AM
actually, the group op timeout logic is not correct. today it says:

* find group ops in progress whose children are all complete
* if group op has timed out, then check children
* if any child has failed, the group op has failed...otherwise the group op has completed

the logic should be 2-part:

* find all group ops in progress that have even one child that isn't complete
* if the group op has timed out, then the group op has failed with TIMEOUT as a reason - should children be canceled?

* find all group ops in progress that have no children in progress
* if any child has failed, the group op has failed...otherwise the group op has completed

Joseph Marques - 24/Sep/08 12:33 PM
after coding up the changes, i don't feel comfortable committing this so close to release time. marking for 1.2, and i will keep the code local for now.

Joseph Marques - 19/Oct/08 09:31 PM
rev1807 - correct the workflow around group operation termination, checks for timed-out and abandoned ops separately;
added oodles of comments to describe new logic in excruciating detail;

Corey Welton - 22/Dec/08 10:34 PM
QA Verified - if all ops complete, the group ops execution result appears as Success. If one fails, the result indicated is "Failure", with a drill-down into the results indicating the op that failed.