Issue 175: Archive old refs/changes/... once changes are closed
Status:  Accepted
Owner:
Reported by code-rev...@gtempaccount.com, Sep 24, 2009
Reported by Shawn Pearce <sop@google.com> on Mon May 11 07:11:49 PDT 2009
Source: JIRA GERRIT-175
Affected Version: 2.0.11

Currently Gerrit creates a ref for each patch set, e.g. refs/change/42/1342/2
is a ref pointing at patch set 2 of change number 1342.

Gerrit creates these refs for two reasons:

- It anchors the commit in the repository during the review, so a "git gc"
doesn't delete it prior to submission.

- It makes it possible to use repo download / git pull to obtain the change
over the git:// or ssh:// protocols.


However, if you submit say 4000 changes to a single project, and each change
uses on average 2 patch sets, you now have 8000 refs in that project.  Where
this is problematic is the Git protocols send a listing of *all* refs to the
client when the client first connects.  In the case of these ~8000 ref/changes
refs, the Git protocol transmits 500 KiB of data to the client to advertise
these changes as being available for download.  This advertisement happens
both for fetch (aka repo download, repo sync, git fetch, git pull) and for
push (aka repo upload, git push).

Fast-forward another year, and you now have 16,000 changes in a single
project.  Now we have to send ~2 MiB of data to the client to advertise these
changes are available for download.  For the most part, the client doesn't
care about any of the information listed in this 2 MiB transfer.


I'm starting to think we may need to create archival repositories and move
some of these changes into them once they have aged sufficiently; e.g. any
change that has been submitted or abandoned and is at least 90 days old should
be moved to an archival repository.  Perhaps using a different repository for
each calendar year, or every X changes.  And the download links should be
updated appropriately to point to the correct archival repository.
Sep 24, 2009
#1 code-rev...@gtempaccount.com
Comment by Shawn Pearce <sop@google.com> on Wed Aug 26 13:23:40 PDT 2009

In http://thread.gmane.org/gmane.comp.version-control.git/126797/focus=127059
the Git community has started to implement a way to hide these change refs
from the initial advertisement sent to the client, but yet allow the client to
request them on demand as necessary.  This would completely eliminate the
problem with these large archives being held in the main repository for the
project, since the advertisement would only be the handful of active branches,
and the huge volume of refs for individual changes would never be presented en
mass to a client.
Sep 24, 2009
#2 sop+code@google.com
(No comment was entered for this change.)
Status: Accepted
Owner: s...@google.com
Nov 21, 2009
#3 sop@google.com
(No comment was entered for this change.)
Owner: s...@google.com
Feb 16, 2015
#4 sschuberth
I'm currently looking at the possible reasons why a push to Gerrit to a repo with about 40k changes would be slow (the "Processing changes" step on the client takes about 2 minutes for us now), see [1].

So, is this issue still valid?

[1] https://groups.google.com/d/msg/repo-discuss/pWe2GarzJA4/LdeHFckdt50J
Jun 26, 2015
#5 amu...@wikimedia.org
@sschuberth I have the same issue doing git fetch.

You can try GIT_TRACE_PACKET=1 git pull

On a fresh clone of a repo using the ssh:// protocol that yields me:

$ GIT_TRACE_PACKET=1 git pull 
15:29:14.235302 pkt-line.c:46           packet:        fetch< 874c4155a4f831a2331b9f6cb8c2ac806c03f603 HEAD\0 include-tag multi_ack_detailed multi_ack ofs-delta side-band side-band-64k thin-pack no-progress shallow 
15:29:14.239709 pkt-line.c:46           packet:        fetch< 8f52c30fef5c0f02ab40a67445239a3178a5d46a refs/changes/00/100/1
15:29:14.239725 pkt-line.c:46           packet:        fetch< a9ff3971e38a17ea1af78851265755d42a01eb10 refs/changes/00/1000/1
15:29:14.239737 pkt-line.c:46           packet:        fetch< d44c409d90a974b731a738ccbb46fd5f90eab43d refs/changes/00/1000/2
15:29:14.239748 pkt-line.c:46           packet:        fetch< f00d3ac2236df658aee10575cc91dd526ae40680 refs/changes/00/101000/1
15:29:14.239761 pkt-line.c:46           packet:        fetch< f6c838022d31049abb870717ac381c8c76897e73 refs/changes/00/101200/1
15:29:14.239773 pkt-line.c:46           packet:        fetch< 131605b3eb26b1686b67fbecedb89b2a2ef26031 refs/changes/00/101800/1
15:29:14.239895 pkt-line.c:46           packet:        fetch< fa5dd3ca8b36bc0e522863f935d5c65914d8a22e refs/changes/00/10200/1
15:29:14.239911 pkt-line.c:46           packet:        fetch< cec6e460743b66d070e2ce015e7c45f64d4d34d9 refs/changes/00/102000/1
15:29:14.239930 pkt-line.c:46           packet:        fetch< 4c5042c0c10be99af9c32cb9b59af790552d8819 refs/changes/00/102200/1
....

That is a lot of network overhead.

I have:
  remote.gerrit.fetch=+refs/heads/*:refs/remotes/gerrit/*


Seems on Gerrit server side one would want to set:

uploadpack.hideRefs refs/changes
uploadpack.hideRefs refs/cache-automerge
uploadpack.allowtipsha1inwant = true

Might not even be supported by JGIt, but that was introduced in git 1.8.2.






Jun 26, 2015
#6 sschuberth
FYI, in our case the cause of the slowness was the NFS hosting our repos. Once we changes that to local storage upload performance was decent again.
Jul 2, 2015
#9 sschuberth
Thanks Edwin for that patch. For my better understanding, without your patch but with above uploadpack.* settings enabled, would a "git fetch <url> refs/changes/80/69280/1 && git checkout FETCH_HEAD" (as copied to the clipboard via a patch's "Download" menu) work? Or is that exactly what your patch is fixing?
Jul 2, 2015
Project Member #10 edwin.ke...@gmail.com
 > Thanks Edwin for that patch. For my better understanding, without your patch but  
 > with above uploadpack.* settings enabled, would a
 > "git fetch <url> refs/changes/80/69280/1 && git checkout FETCH_HEAD" (as copied 
 > to the clipboard via a patch's "Download" menu) work?
No, with this configuration this command is not working anymore, since the change ref is not advertised to the client.

 > Or is that exactly what your patch is fixing?
Yes, this is what the patch should fix. Since the change ref is not advertised, the download commands should use the commit ID instead of the change ref.

Jul 2, 2015
#11 oswald.b...@gmx.de
so what has become of "yet allow the client to request them on demand as necessary"? i kind of dislike the idea of introducing an asymmetry, and effectively breaking backwards compatibility.
Jul 2, 2015
#12 sschuberth
I believe "uploadpack.allowtipsha1inwant = true" is the "yet allow the client to request them on demand as necessary" part, no?
Jul 2, 2015
#13 oswald.b...@gmx.de
that isn't quite adequate imo. i would have expected that the client would be able to explicitly request specific refs to be listed despite them being "hidden". as-is, the access to these refs becomes quite asymmetric.
Jul 2, 2015
Project Member #14 edwin.ke...@gmail.com
Not sure what you mean by asymetric. The client should always use the download commands provided by Gerrit. Then it's completely transparent for the client how the commit is fetched.
Jul 2, 2015
#15 oswald.b...@gmx.de
you can't really expect that. see https://codereview.qt-project.org/97379 (i plan to hack our gerrit to offer adjusted download links, but i don't have other gerrits under control, obviously).
Jul 2, 2015
#16 oswald.b...@gmx.de
and there is also http://code.qt.io/cgit/qt/qtrepotools.git/tree/git-hooks/gerrit-bot which does custom downloads. of course that one wouldn't be affected, as it will never access "archived" refs, but the point is that there are probably hundreds of scripts which rely on the current structure. the change should happen in git if at all possible, so things keep working.
Jul 22, 2015
#17 amu...@wikimedia.org
Seems Gerrit patch https://gerrit-review.googlesource.com/#/c/69280/ lets one fetch using the commit shall which let one use upload packhiderefs and uploadpack.allowtipsha1inwant by specifying the commit shall instead of the ref (the ref would be rejected).

I noticed that git 2.5.0 has:

> "git upload-pack" that serves "git fetch" can be told to serve commits that are not at the tip of any ref, as long as they are reachable from a ref, with `uploadpack.allowReachableSHA1InWant` configuration variable.


Seems JGit is catching up https://git.eclipse.org/r/#/c/49652/ "UploadPack: Use reachable-sha1-in-want configuration".


So potentially bumping JGit and setting packhiderefs/allowtipsha1inwant/allowreacheablesha1inwant would prevent git from fetching all the refs/changes/* hierarchy.



Nov 17, 2015
#18 fredrik....@gmail.com
With https://gerrit-review.googlesource.com/#/c/72258/, Gerrit 2.12 adopted JGit 4.1. This means that the allowReachableSHA1InWant configuration can be used (have not tested it though).

https://gerrit-review.googlesource.com/#/c/72259/ updates the download plugin to read that configuration as well.