| Issue 477: | Replication failures shouldn't cause a crash | |
| 1 person starred this issue and may be notified of changes. | Back to list |
Affected Version: 2.1.1.1 What steps will reproduce the problem? 1. Do some kind of configuration where gerrit can't write to the requested location (a local url). I *think* in this example, the repo didn't exist at the time. But maybe permissions were wrong. 2. Wait till gerrit crashes 3. Find errors, such as http://p.defau.lt/?zOJryhIqYmRoiExKYTzsNQ Failures in replication should be logged, however, it is a recoverable error. Warn, and continue on - don't crash.
Mar 2, 2010
Actually, I was confused between issue 477 and issue 478 . This issue I just don't get. It reads to me like its a duplicate of issue 478 , which is that the crash didn't get put into the log file. We already do what you request at the end of the original message... if we can't replicate to a particular URL we log the error, but we keep going. Including trying to use that URL again the next time a change happens, or when the admin forces us to replicate again with `gerrit replicate` over the SSH interface.
Status:
Duplicate
Labels: -Milestone-2.1.2 Mergedinto: 478
Mar 2, 2010
Well, it shouldn't crash at all, should it?
Mar 2, 2010
I think we're talking about two different things or something. We shouldn't start if the configuration is so bogus we have no way to continue working with it. Since we can't read the file on the fly and adjust, its pointless to continue to start if the file is bad. The admin has to stop and restart us to get the fixed configuration to be recognized. If however we can at least get the server running, then we need to handle transient replication errors gracefully. Sometimes the destination repository becomes unavailable, because say the remote system kernel panic'd and is rebooting itself. Currently when this happens we log an error, but we'll retry that URL again on the next replication event. Fine... so long as we get a new event that will cause us to retry. Apparently we didn't have an issue for retrying. I just opened issue 482 for that.
Mar 2, 2010
OK, you're right, I should be commenting on issue 478 |
|
| ► Sign in to add a comment |
I can't reproduce this. If I put the following in: [remote "bad"] url = /does.not.exist/${name}.git The server still starts up normally, but after 30 seconds or so the log fills up with replication failed messages for each of the projects aborting because their target isn't found. But if the URL is really mangled: [remote "bad"] url = /does.not.exist/${name.git Then the server aborts on startup, and the log doesn't have the stack trace at all. So your original example on the list didn't match with the failure... but I do see a bug here that I'll try to fix in 2.1.2.Labels: Milestone-2.1.2