My favorites | Sign in
Project Home Downloads Wiki Issues Source
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 477: Replication failures shouldn't cause a crash
1 person starred this issue and may be notified of changes. Back to list
Status:  Duplicate
Merged:  issue 478
Owner:  ----
Closed:  Mar 2010


Sign in to add a comment
 
Reported by mike.lifeguard@gmail.com, Mar 2, 2010
Affected Version: 2.1.1.1

What steps will reproduce the problem?
1. Do some kind of configuration where gerrit can't write to the requested
location (a local url). I *think* in this example, the repo didn't exist at
the time. But maybe permissions were wrong.
2. Wait till gerrit crashes
3. Find errors, such as http://p.defau.lt/?zOJryhIqYmRoiExKYTzsNQ

Failures in replication should be logged, however, it is a recoverable
error. Warn, and continue on - don't crash.
Mar 2, 2010
#1 sop@google.com
I can't reproduce this.  If I put the following in:

  [remote "bad"]
    url = /does.not.exist/${name}.git

The server still starts up normally, but after 30 seconds or so
the log fills up with replication failed messages for each of the
projects aborting because their target isn't found.

But if the URL is really mangled:

  [remote "bad"]
    url = /does.not.exist/${name.git

Then the server aborts on startup, and the log doesn't have
the stack trace at all.  So your original example on the list
didn't match with the failure... but I do see a bug here that
I'll try to fix in 2.1.2.
Status: Accepted
Labels: Milestone-2.1.2
Mar 2, 2010
#2 sop@google.com
Actually, I was confused between  issue 477  and  issue 478 .

This issue I just don't get.  It reads to me like its a
duplicate of  issue 478 , which is that the crash didn't
get put into the log file.

We already do what you request at the end of the original
message... if we can't replicate to a particular URL we
log the error, but we keep going.  Including trying to use
that URL again the next time a change happens, or when the
admin forces us to replicate again with `gerrit replicate`
over the SSH interface.
Status: Duplicate
Labels: -Milestone-2.1.2
Mergedinto: 478
Mar 2, 2010
#3 mike.lifeguard@gmail.com
Well, it shouldn't crash at all, should it?
Mar 2, 2010
#4 sop@google.com
I think we're talking about two different things or something.

We shouldn't start if the configuration is so bogus we have no
way to continue working with it.  Since we can't read the file
on the fly and adjust, its pointless to continue to start if
the file is bad.  The admin has to stop and restart us to get
the fixed configuration to be recognized.

If however we can at least get the server running, then we need
to handle transient replication errors gracefully.  Sometimes
the destination repository becomes unavailable, because say the
remote system kernel panic'd and is rebooting itself.

Currently when this happens we log an error, but we'll retry that
URL again on the next replication event.  Fine... so long as we
get a new event that will cause us to retry.

Apparently we didn't have an issue for retrying.  I just opened
 issue 482  for that.
Mar 2, 2010
#5 mike.lifeguard@gmail.com
OK, you're right, I should be commenting on  issue 478 
Sign in to add a comment

Powered by Google Project Hosting