Issue 2140: [gerrit gc] gerrit gc failed, cleanup of data.
Status:  Released
Owner: ----
Closed:  Nov 2013
Reported by Ian.Kuml...@gmail.com, Sep 23, 2013
************************************************************
***** NOTE: THIS BUG TRACKER IS FOR GERRIT CODE REVIEW *****
***** DO NOT SUBMIT BUGS FOR CHROME, ANDROID, INTERNAL *****
***** ISSUES WITH YOUR COMPANY'S GERRIT SETUP, ETC.    *****
***** THOSE ISSUE BELONG IN DIFFERENT ISSUE TRACKERS!  *****
************************************************************

Affected Version: 2.7

What steps will reproduce the problem?
1. run gerrit gc
2. have some gits fail (spectacularly, i assume - it's the ones mentioned in the other reports by me)
3. rerun gerrit gc and get the message that the gc was already scheduled
(and will be so until you restart gerrit)

What is the expected output? What do you see instead?
Actually try to rerun the gerrit gc

Please provide any additional information below.

I couldn't get any information in the logs so i assume that this is a resource leakage...

Kept returning:
error: garbage collection for project "gerrit/project" was already scheduled

After it had stated:
error: garbage collection for project "gerrit/project" failed
Sep 30, 2013
#1 Ian.Kuml...@gmail.com
Running a gerrit gc on 3281 projects has resulted in: 2562 projects being stuck in "was already scheduled" - this is a quite high percentage... 

Nov 18, 2013
#3 casta...@motorola.com
I am seeing this exact same scenario.

The error that is thrown just before I start getting "already scheduled" shows only: "fatal: internal server error" and there is nothing in the error log.

Then every repo I try after this fails.

Note if I go back to the beginning and try again the repos leading up to the failure still GC okay.  But the repos after the failure will always fail.  It's as if even though I did this:

     ssh -p 29418 `hostname` gerrit gc this/is/a/specific/project

that the failure when it occurs seems to have assumed I did this:

     ssh -p 29418 `hostname` gerrit gc --all

So that all other projects get marked as "already scheduled" when i fact they were never even used on the command line.





Nov 18, 2013
#4 casta...@motorola.com
The output from the command on the command line looks like this:

collecting garbage for "...repo...":
Pack refs:              100% (3950/3950)
Counting objects:       252202
Finding sources:        100% (252202/252202)
Getting sizes:          100% (155610/155610)
Compressing objects:     94% (258001/273190)fatal: internal server error

That's it.  Just like that.


I guess there was one error in the error_log file.  But it isn't very useful:

[2013-11-18 06:05:48,327] ERROR com.google.gerrit.sshd.BaseCommand : Internal server error (user gitsync account 1001110) during gerrit gc ...repo...
java.lang.NullPointerException

That's it.  No stack trace.

Nov 18, 2013
#5 casta...@motorola.com
And this is definitely not a "Minor" bug.  Unless someone identifies a workaround for this issue then Gerrit garbage collection is just broken for us and we will never realize the suggested performance improvements from this change that we so desperately need.


Nov 18, 2013
#6 casta...@motorola.com
What would be great is if we could also get more error output somehow.  Where else can we go to get more error data?
Nov 18, 2013
Project Member #7 edwin.ke...@gmail.com
The error handling was fix by:
  https://gerrit-review.googlesource.com/51870
Status: Submitted
Labels: FixedIn-2.8
Nov 19, 2013
#8 casta...@motorola.com
Thanks for addressing this.  I have two questions tho:
(1) Will this change also allow all other projects after the error to be garbage collected?  The indicated patch didn't look like it addressed that part of the bug.
(2) Will this change actually result in any more useful/specific or additional data in the error log?
Nov 19, 2013
Project Member #9 edwin.ke...@gmail.com
> (1) Will this change also allow all other projects after the error to be garbage 
> collected?  The indicated patch didn't look like it addressed that part of the bug.
Yes. The problem was that the GC for one project failed with a RuntimeException which wasn't caught and hence the loop over the projects was exited. The rest of the projects stayed in the scheduled queue without being processed anymore. Since we are now catching RuntimeExceptions, the exeption only effects the GC of the one project and afterwards the GC for the rest of the projects is done.

> (2) Will this change actually result in any more useful/specific or additional
> data in the error log?
Yes, RuntimeExceptions are now caught and a log entry with full stacktrace is written.
Nov 19, 2013
#10 casta...@motorola.com
I see.  Thanks for the explanation.
Nov 19, 2013
#11 Ian.Kuml...@gmail.com
It also caught me off guard, took a while - I suspect that this will fix/help us to locate the problem in issue #2138 ... =)
Nov 19, 2013
Project Member #12 edwin.ke...@gmail.com
I know this explanantion should have been in the commit message, but yesterday I was in a hurry and had to leave quickly. Still I wanted to get the fix out.
Nov 19, 2013
#13 casta...@motorola.com
One more thing however....

In the failure scenario we keep seeing, the garbage collection project list is only a >single project<.  However every project in the system gets marked as "already scheduled".  This seems like a different bug to me that doesn't seem fixed in this particular change.  It feels to me like Gerrit is just assuming everyone always uses "--all".

Can you confirm that this change will also keep the problem from occurring when there is only one project passed to "gerrit gc"?

Thanks.
Nov 19, 2013
Project Member #14 edwin.ke...@gmail.com
No, this must be something else then. If you trigger the GC for a single project it shouldn't mark all projects as scheduled. If it does, this is a bug, but it's not fixed by this change.
Dec 9, 2013
Project Member #15 david.pu...@sonymobile.com
(No comment was entered for this change.)
Status: Released