| Issue 390: | Branches disappear and don't fetch/clone | |
| 10 people starred this issue and may be notified of changes. | Back to list |
Affected Version: 2.1.1.1 Sometimes a branch disappears, and it cannot be fetched or cloned anymore. repo sync shows this as: $ repo sync Fetching projects: 100% (224/224), done. error: master in platform/bionic not found I suspect what's happening is a background `git gc` job runs and moves the branch into the $GIT_DIR/packed-refs file, but JGit doesn't seem to be reloading the packed-refs data after the git gc pass. Since the branch is no longer loose JGit is not reporting it to a client.
Jan 15, 2010
Project Member
#1
fredrik....@sonyericsson.com
Jan 15, 2010
Addition: By "large commits" I mean to say that it is an external delivery, which means that the commit contains a lot of files, a lot of those files are updated, so the upload of the commit means a lot of new blob data and a new ref on the server.
Jan 19, 2010
Given what is happening in issue 394 , we might actually be looking at a different variant of issue 394 . If the object that a branch points to cannot be read from disk, the branch just silently disappears, and no error is logged to the server log file. So issue 394 can cause the branch to vanish like we are seeing here.
Blockedon:
394
Jan 30, 2010
(No comment was entered for this change.)
Labels:
-Milestone-Next Milestone-2.1.2
Feb 21, 2010
Issue 394 has been merged into this issue.
Mar 1, 2010
My organization started seeing this today too, with similar symptoms as explained in issue 394 : fatal: protocol error: bad pack header Has anyone been able to temporarily work-around this problem?
Mar 1, 2010
Another update. I just noticed this post: http://groups.google.com/group/repo- discuss/browse_thread/thread/d137c9e55e55542 I dropped down to a shell and run "git gc" on the problematic git repo as the gerrit2 user and it fixed the problem.
Mar 2, 2010
Slipped to 2.1.3. I want to get 2.1.2 out.
Labels:
Milestone-2.1.3
Mar 2, 2010
(No comment was entered for this change.)
Labels:
-Milestone-2.1.2
Mar 11, 2010
I have finally been able to recreate this problem! 1) Push a commit onto a git (the error occurs more likely if the commit is big (mine was 800 megs from /dev/urandom). 2) Let the replication to the replication-server finish 3) Clone the project from the replication server (make sure you are the FIRST person to clone after the replication is done). 4) Ctrl-C the clone 5) You are now the proud owner of a broken git. (we heal it with 'git gc') cloning the git again will give you something like this: Initialized empty Git repository in /mnt/src/helloworld/helloworld/.git/ remote: Counting objects: 2765, done remote: Compressing objects: 100% (2765/2765) fatal: internal server error6/2765), 165.82 MiB | 11346 KiB/s fatal: The remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed
Mar 12, 2010
Btw, forgot to add that the push in step 1) is to refs/heads/master.
Apr 9, 2010
Is this still reproducible?
Apr 26, 2010
Re comment #10, when the replication is running is that going over the system SSH, writing the objects directly into the repository behind Gerrit's back? I think its a red-herring that ctrl-c'ing that first clone causes things to break for all subsequent users. And I doubt 800 MiB is actually needed to trigger this. What's probably happening is, your 800 MiB push contained enough *objects* that it was over the 100 object limit and was retained as a pack file, rather than being exploded to loose objects. And the Gerrit server failed to figure out that a new pack file was available on disk.
Apr 26, 2010
Hi Shawn / Comment #11 Yes, we're replicating over OpenSSH. The 800MiB example was mentioned as the safest way to reproduce the bug. But this is certainly not the only possible scenario, we see it quite often when we push more than one object as well. The test we did, IIRC, was to check in one large 800MiB binary. I'm not sure what that means to git internally.. I thought it meant only one huuuuuuuuuge blob-object rather than many? (and then tree and commit objects, obviuosly.. still not hundreds?) I'd love to tell you more on the differences between when the sync completes versus when you ctrl-c it, but I was not around Ulrik and Ernst when they set about to reproduce it, and hence my answer is less useful than it could've been. They might add their own comments tormorrow morning, EU hours. Hope it helps!
May 3, 2010
Had any luck with this Shawn? Can you reproduce it if you follow the #10 steps?
May 3, 2010
Nope. I spent about a day on it last week. I wasn't able to reproduce by following comment #10. So I spent some time looking through this section of code in JGit. There is a possibly bad condition relating to a push into Gerrit Code Review confusing a concurrent read. I've posted patches for it to JGit, and I see they got merged over the weekend. I doubt they fix the case described here though, because the push must occur over the Gerrit port to trigger the condition. My week this week is all messed up scheduling wise due to personal stuff that I have going on right now. But I plan to devote most of what I can this week at work to looking at this problem more, maybe I'll have some flash of insight if I stare at the code long enough.
May 4, 2010
Some notes from an IM session with an admin suffering from
this bug on their Gerrit server, against a Linux kernel repo:
Them> got again the false missing object exception, I do notice
> one thing tho, almost all the time it's complaining about
> the object that is the vanilla 2.6.33 commit (we initialize
> all our branches to start from that)
Me > ugh
Them> hi again, wtf, I just found out that we have disabled the
> repack script sometime in March and are only running the
> resync-all script every night so it would mean those problems
> are not because of the external repacking
Me > yikes
> so the vanilla 2.6.33 commit went poof solely due to gerrit
> adding new pack files during pushes.
Them> not sure why it did, but yeah, it seems Gerrit doesn't know
> about it even tho it exists and works after a restart (neither
> of the touch or "git gc" solve the issue, only restarting Gerrit
> does so far)
Makes me start to suspect that the PackFile object which contains the
commit got marked as corrupt in memory, or it was simply omitted from
the PackList object somehow during a copy of the array.
Labels:
Component-JGit
May 13, 2010
Slightly new theory: JGit has an open bug [1] where pack files are accessed after their file descriptor was closed. These usually result in an IOException being thrown back at the caller. In many places within ObjectDirectory, JGit consumes an IOException when accessing the pack file and removes the pack file from its list of known packs. Since the exception is not logged, we don't know if this condition is triggering or not. When the pack gets removed from the list of known packs, it is never put back into the list because the objects/pack mtime doesn't change. So if this read-after-close bug occurs at the right place, we won't log it, but we'll close the pack and forget it ever exists. Later on when we can't access the object we log the missing object error, or simply hide the branch from the client entirely. [1] https://bugs.eclipse.org/bugs/show_bug.cgi?id=308945
May 27, 2010
Fixed in Gerrit by change I50a1cd941fe9f0a7dd2a6a15d6bd56a36fc773a0
Status:
Fixed
Labels: -Milestone-2.1.3 FixedIn-2.1.3
Jun 1, 2010
(No comment was entered for this change.)
Labels:
-FixedIn-2.1.3 FixedIn-2.1.2.5
Jun 8, 2010
We're hitting this daily now, even on 2.1.2.5. We're running the work around script that touches the pack objects files. I'll try and disable that to see if it helps.
Jun 8, 2010
My problem could well be issue 585 too. I'll provide the details in 858.
Mar 27, 2012
(No comment was entered for this change.)
Status:
Released
|
|
| ► Sign in to add a comment |