| Issue 394: | protocol error: bad pack header | |
| 4 people starred this issue and may be notified of changes. | Back to list |
Affected Version: Gerrit 2.1.1.1
Environment: Ubuntu 8.10 64 bit, 16GB RAM
Date: Around noon, the 18th.
The initial report from the users is this:
Initializing project platform/frameworks/base ...
warning: no common commits
fatal: internal server error
fatal: The remote end hung up unexpectedly
fatal: protocol error: bad pack header
error: Cannot fetch platform/frameworks/base
This is repeatable, several clients get the same output.
On the server, everything looks dandy. To be safe I run a gc:
platform/frameworks/base.git$ git gc
Counting objects: 146059, done.
Compressing objects: 100% (49024/49024), done.
Writing objects: 100% (146059/146059), done.
Total 146059 (delta 83409), reused 145995 (delta 83345)
platform/frameworks/base.git$
Latest commit according to git log --all
Date: Sat Jan 16 16:10:45 2010 +0100
Once git gc has completed, the sync now works as expected!
But this doesn't last long, in the same sync I get the same output for
another internal git means running git gc once more:
$ git gc
Counting objects: 94, done.
Compressing objects: 100% (28/28), done.
Writing objects: 100% (94/94), done.
Total 94 (delta 44), reused 94 (delta 44)
Lasest commit:
Date: Fri Dec 18 18:48:37 2009 +0100
This should mean that a lot of gcs has run weekly and sometimes daily
since then. And still we get a repack!
After very short time a new batch of gits "turn bad". By now patience is
over, so I restart the gerrit service.
But first some stats:
$ ps Tuxa | grep java
gerrit2 1039 76.7 52.1 9066324 8473312 pts/1 Sl Jan15 3194:58
GerritCodeReview -Xmx8g -Djavax.net.ssl.trustStore=/some/files -
Djavax.net.ssl.trustStorePassword=passwd -jar /srv/gerrit2/bin/gerrit.war
daemon -d /srv/gerrit2
-> internal ps shows:
Name Max |Object Count | AvgGet |Hit Ratio |
Age | Disk Mem Cnt| |Disk Mem Agg |
-------------------------+--------------------+----------+--------------+
accounts 90d | 218| | 99%|
accounts_byemail 90d | | | |
diff 90d | | | |
groups 90d | 17| 6.5ms | 0%|
ldap_groups 1d | 216| | 99%|
ldap_usernames 90d | | | |
projects 90d | 321| | 99%|
sshkeys 90d | 166| | 98%|
web_sessions 12h | | | |
JGit Buffer Cache:
open files : 15
loaded : 199.99 mb
mem% : 2%
JVM Heap:
max : 7.37 gb
inuse : 1.88 gb
mem% : 25%
Not sure if this gives anything of value, if I could understand the output
from jstat better I'd show you some figures from that too. As it stands
I'm not sure how to configure it to get any good out of it.
Jan 19, 2010
Project Member
#1
fredrik....@sonyericsson.com
Jan 19, 2010
fredrik.luthander: I have added you as a project contributor, so you can now edit issues, and can set the labels yourself. Since you asked, I'm also tagging this as Priority-Blocker for now.
Labels:
-Priority-Minor Priority-Blocker
Jan 19, 2010
Is there anything in the error_log that matches this time period?
A failure on the server side like this that produces a corrupt
data stream to the client should have also triggered an exception
with a stack trace in the server's error_log. Having that would
help me track down what is wrong here.
I also think your [core] section is perhaps misconfigured for
the Android project set. We're running the following given an
8g heap size for our JVM:
[container]
heapLimit = 8g
[core]
packedGitOpenFiles = 4096
packedGitLimit = 2g
packedGitWindowSize = 16k
Status:
Accepted
Jan 19, 2010
Hi Shawn!
Here's the first of the stacks from my logfile the 18th..
[2010-01-18 10:54:21,179] ERROR com.google.gerrit.sshd.BaseCommand : Internal server
error (user first.last account 1000115) during git-upload-
pack '/platform/vendor/dir/repository.git'
com.google.gerrit.sshd.BaseCommand$Failure: fatal: 'platform/vendor/dir/repository':
not a git archive
at com.google.gerrit.sshd.commands.AbstractGitCommand.service
(AbstractGitCommand.java:58)
at com.google.gerrit.sshd.commands.AbstractGitCommand.access$100
(AbstractGitCommand.java:30)
at com.google.gerrit.sshd.commands.AbstractGitCommand$1.run
(AbstractGitCommand.java:46)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:405)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301
(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run
(ScheduledThreadPoolExecutor.java:207)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:310)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.eclipse.jgit.errors.RepositoryNotFoundException: repository not
found: Cannot open repository platform/vendor/dir/repository
at com.google.gerrit.server.git.LocalDiskRepositoryManager.openRepository
(LocalDiskRepositoryManager.java:97)
at com.google.gerrit.sshd.commands.AbstractGitCommand.service
(AbstractGitCommand.java:56)
... 12 more
Caused by: org.eclipse.jgit.errors.RepositoryNotFoundException: repository not
found: /srv/gerrit2/git/platform/vendor/dir/repository
at org.eclipse.jgit.lib.RepositoryCache$FileKey.open
(RepositoryCache.java:298)
at org.eclipse.jgit.lib.RepositoryCache.openRepository
(RepositoryCache.java:158)
at org.eclipse.jgit.lib.RepositoryCache.open(RepositoryCache.java:99)
at org.eclipse.jgit.lib.RepositoryCache.open(RepositoryCache.java:74)
at com.google.gerrit.server.git.LocalDiskRepositoryManager.openRepository
(LocalDiskRepositoryManager.java:94)
... 13 more
Jan 19, 2010
That is a filesystem problem.
Gerrit is looking for the following:
/srv/gerrit2/git/platform/vendor/dir/repository/objects
/srv/gerrit2/git/platform/vendor/dir/repository/refs
/srv/gerrit2/git/platform/vendor/dir/repository/HEAD
OR
/srv/gerrit2/git/platform/vendor/dir/repository.git/objects
/srv/gerrit2/git/platform/vendor/dir/repository.git/refs
/srv/gerrit2/git/platform/vendor/dir/repository.git/HEAD
But we're missing at least one of these items; stat() is returning
an error code causing Gerrit to believe the one or more of the paths
does not exist.
Is it possible that your daemon process is losing access to the
filesystem due to a key expiring? E.g. I just saw this myself
yesterday on a new Gerrit server I installed on top of an encrypted
filesystem. The encrypted filesystem automatically timed itself out
and locked the key after X hours without an interactive login to the
host system. That meant the daemon lost access to the filesystem and
we started to also get these exceptions. But logging into the host
immediately resolved it as the filesystem unlocked.
However, this stack trace can't be the culprit of the originally
reported client error. The original client error can only happen
if we've gotten past this point. You emailed me your log file, in
that log I found two other errors that are the actual problems:
[2010-01-18 11:06:33,079] ERROR com.google.gerrit.sshd.BaseCommand : Internal server
error (user X account Y) during git-upload-pack '/platform/frameworks/base.git'
java.lang.NullPointerException
at
org.eclipse.jgit.lib.PackWriter.writeWholeObjectDeflate(PackWriter.java:763)
at org.eclipse.jgit.lib.PackWriter.writeObject(PackWriter.java:732)
at org.eclipse.jgit.lib.PackWriter.writeObjects(PackWriter.java:692)
at org.eclipse.jgit.lib.PackWriter.writePack(PackWriter.java:592)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:471)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:246)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:227)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:26)
at
com.google.gerrit.sshd.commands.AbstractGitCommand.service(AbstractGitCommand.java:62
)
at
com.google.gerrit.sshd.commands.AbstractGitCommand.access$100(AbstractGitCommand.java
:30)
at
com.google.gerrit.sshd.commands.AbstractGitCommand$1.run(AbstractGitCommand.java:46)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:405)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Sched
uledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThr
eadPoolExecutor.java:207)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:310)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[2010-01-18 11:06:42,587] ERROR com.google.gerrit.sshd.BaseCommand : Internal server
error (user A account B) during git-upload-pack '/platform/frameworks/base.git'
org.eclipse.jgit.errors.MissingObjectException: Missing commit c0ffee....
at org.eclipse.jgit.revwalk.RevObject.loadCanonical(RevObject.java:77)
at org.eclipse.jgit.revwalk.RevCommit.parseHeaders(RevCommit.java:83)
at org.eclipse.jgit.revwalk.PendingGenerator.next(PendingGenerator.java:138)
at
org.eclipse.jgit.revwalk.TopoSortGenerator.<init>(TopoSortGenerator.java:72)
at org.eclipse.jgit.revwalk.StartGenerator.next(StartGenerator.java:145)
at org.eclipse.jgit.revwalk.RevWalk.next(RevWalk.java:381)
at org.eclipse.jgit.revwalk.ObjectWalk.next(ObjectWalk.java:208)
at org.eclipse.jgit.lib.PackWriter.findObjectsToPack(PackWriter.java:856)
at org.eclipse.jgit.lib.PackWriter.preparePack(PackWriter.java:494)
at org.eclipse.jgit.transport.UploadPack.sendPack(UploadPack.java:455)
at org.eclipse.jgit.transport.UploadPack.service(UploadPack.java:246)
at org.eclipse.jgit.transport.UploadPack.upload(UploadPack.java:227)
at com.google.gerrit.sshd.commands.Upload.runImpl(Upload.java:26)
at
com.google.gerrit.sshd.commands.AbstractGitCommand.service(AbstractGitCommand.java:62
)
at
com.google.gerrit.sshd.commands.AbstractGitCommand.access$100(AbstractGitCommand.java
:30)
at
com.google.gerrit.sshd.commands.AbstractGitCommand$1.run(AbstractGitCommand.java:46)
at com.google.gerrit.sshd.BaseCommand$TaskThunk.run(BaseCommand.java:405)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Sched
uledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThr
eadPoolExecutor.java:207)
at com.google.gerrit.server.git.WorkQueue$Task.run(WorkQueue.java:310)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Either of these exceptions would result in the originally reported
client error. I'll have to look into these more, but I think they
happen because the repository was repacked and the underlying pack
we were trying to use is gone. JGit should have detected this and
recovered itself, but it seems that isn't happening like it should.
Feb 21, 2010
I think this is the same as issue 390 ... they both seem to have the same underlying cause of packs disappearing, leaving the server with objects being temporarily inaccessible.
Status:
Duplicate
Mergedinto: 390 |
|
| ► Sign in to add a comment |