| Issue 175: | Archive old refs/changes/... once changes are closed | |
| 7 people starred this issue and may be notified of changes. | Back to list |
Reported by Shawn Pearce <sop@google.com> on Mon May 11 07:11:49 PDT 2009 Source: JIRA GERRIT-175 Affected Version: 2.0.11 Currently Gerrit creates a ref for each patch set, e.g. refs/change/42/1342/2 is a ref pointing at patch set 2 of change number 1342. Gerrit creates these refs for two reasons: - It anchors the commit in the repository during the review, so a "git gc" doesn't delete it prior to submission. - It makes it possible to use repo download / git pull to obtain the change over the git:// or ssh:// protocols. However, if you submit say 4000 changes to a single project, and each change uses on average 2 patch sets, you now have 8000 refs in that project. Where this is problematic is the Git protocols send a listing of *all* refs to the client when the client first connects. In the case of these ~8000 ref/changes refs, the Git protocol transmits 500 KiB of data to the client to advertise these changes as being available for download. This advertisement happens both for fetch (aka repo download, repo sync, git fetch, git pull) and for push (aka repo upload, git push). Fast-forward another year, and you now have 16,000 changes in a single project. Now we have to send ~2 MiB of data to the client to advertise these changes are available for download. For the most part, the client doesn't care about any of the information listed in this 2 MiB transfer. I'm starting to think we may need to create archival repositories and move some of these changes into them once they have aged sufficiently; e.g. any change that has been submitted or abandoned and is at least 90 days old should be moved to an archival repository. Perhaps using a different repository for each calendar year, or every X changes. And the download links should be updated appropriately to point to the correct archival repository.
Sep 24, 2009
#1
code-rev...@gtempaccount.com
Sep 24, 2009
(No comment was entered for this change.)
Status:
Accepted
Owner: s...@google.com
Nov 21, 2009
(No comment was entered for this change.)
Owner:
s...@google.com
Feb 16, 2015
I'm currently looking at the possible reasons why a push to Gerrit to a repo with about 40k changes would be slow (the "Processing changes" step on the client takes about 2 minutes for us now), see [1]. So, is this issue still valid? [1] https://groups.google.com/d/msg/repo-discuss/pWe2GarzJA4/LdeHFckdt50J
Jun 26, 2015
@sschuberth I have the same issue doing git fetch. You can try GIT_TRACE_PACKET=1 git pull On a fresh clone of a repo using the ssh:// protocol that yields me: $ GIT_TRACE_PACKET=1 git pull 15:29:14.235302 pkt-line.c:46 packet: fetch< 874c4155a4f831a2331b9f6cb8c2ac806c03f603 HEAD\0 include-tag multi_ack_detailed multi_ack ofs-delta side-band side-band-64k thin-pack no-progress shallow 15:29:14.239709 pkt-line.c:46 packet: fetch< 8f52c30fef5c0f02ab40a67445239a3178a5d46a refs/changes/00/100/1 15:29:14.239725 pkt-line.c:46 packet: fetch< a9ff3971e38a17ea1af78851265755d42a01eb10 refs/changes/00/1000/1 15:29:14.239737 pkt-line.c:46 packet: fetch< d44c409d90a974b731a738ccbb46fd5f90eab43d refs/changes/00/1000/2 15:29:14.239748 pkt-line.c:46 packet: fetch< f00d3ac2236df658aee10575cc91dd526ae40680 refs/changes/00/101000/1 15:29:14.239761 pkt-line.c:46 packet: fetch< f6c838022d31049abb870717ac381c8c76897e73 refs/changes/00/101200/1 15:29:14.239773 pkt-line.c:46 packet: fetch< 131605b3eb26b1686b67fbecedb89b2a2ef26031 refs/changes/00/101800/1 15:29:14.239895 pkt-line.c:46 packet: fetch< fa5dd3ca8b36bc0e522863f935d5c65914d8a22e refs/changes/00/10200/1 15:29:14.239911 pkt-line.c:46 packet: fetch< cec6e460743b66d070e2ce015e7c45f64d4d34d9 refs/changes/00/102000/1 15:29:14.239930 pkt-line.c:46 packet: fetch< 4c5042c0c10be99af9c32cb9b59af790552d8819 refs/changes/00/102200/1 .... That is a lot of network overhead. I have: remote.gerrit.fetch=+refs/heads/*:refs/remotes/gerrit/* Seems on Gerrit server side one would want to set: uploadpack.hideRefs refs/changes uploadpack.hideRefs refs/cache-automerge uploadpack.allowtipsha1inwant = true Might not even be supported by JGIt, but that was introduced in git 1.8.2.
Jun 26, 2015
FYI, in our case the cause of the slowness was the NFS hosting our repos. Once we changes that to local storage upload performance was decent again.
Jul 1, 2015
https://gerrit-review.googlesource.com/69280
Jul 2, 2015
Thanks Edwin for that patch. For my better understanding, without your patch but with above uploadpack.* settings enabled, would a "git fetch <url> refs/changes/80/69280/1 && git checkout FETCH_HEAD" (as copied to the clipboard via a patch's "Download" menu) work? Or is that exactly what your patch is fixing?
Jul 2, 2015
> Thanks Edwin for that patch. For my better understanding, without your patch but > with above uploadpack.* settings enabled, would a > "git fetch <url> refs/changes/80/69280/1 && git checkout FETCH_HEAD" (as copied > to the clipboard via a patch's "Download" menu) work? No, with this configuration this command is not working anymore, since the change ref is not advertised to the client. > Or is that exactly what your patch is fixing? Yes, this is what the patch should fix. Since the change ref is not advertised, the download commands should use the commit ID instead of the change ref.
Jul 2, 2015
so what has become of "yet allow the client to request them on demand as necessary"? i kind of dislike the idea of introducing an asymmetry, and effectively breaking backwards compatibility.
Jul 2, 2015
I believe "uploadpack.allowtipsha1inwant = true" is the "yet allow the client to request them on demand as necessary" part, no?
Jul 2, 2015
that isn't quite adequate imo. i would have expected that the client would be able to explicitly request specific refs to be listed despite them being "hidden". as-is, the access to these refs becomes quite asymmetric.
Jul 2, 2015
Not sure what you mean by asymetric. The client should always use the download commands provided by Gerrit. Then it's completely transparent for the client how the commit is fetched.
Jul 2, 2015
you can't really expect that. see https://codereview.qt-project.org/97379 (i plan to hack our gerrit to offer adjusted download links, but i don't have other gerrits under control, obviously).
Jul 2, 2015
and there is also http://code.qt.io/cgit/qt/qtrepotools.git/tree/git-hooks/gerrit-bot which does custom downloads. of course that one wouldn't be affected, as it will never access "archived" refs, but the point is that there are probably hundreds of scripts which rely on the current structure. the change should happen in git if at all possible, so things keep working.
Jul 22, 2015
Seems Gerrit patch https://gerrit-review.googlesource.com/#/c/69280/ lets one fetch using the commit shall which let one use upload packhiderefs and uploadpack.allowtipsha1inwant by specifying the commit shall instead of the ref (the ref would be rejected). I noticed that git 2.5.0 has: > "git upload-pack" that serves "git fetch" can be told to serve commits that are not at the tip of any ref, as long as they are reachable from a ref, with `uploadpack.allowReachableSHA1InWant` configuration variable. Seems JGit is catching up https://git.eclipse.org/r/#/c/49652/ "UploadPack: Use reachable-sha1-in-want configuration". So potentially bumping JGit and setting packhiderefs/allowtipsha1inwant/allowreacheablesha1inwant would prevent git from fetching all the refs/changes/* hierarchy.
Nov 17, 2015
With https://gerrit-review.googlesource.com/#/c/72258/, Gerrit 2.12 adopted JGit 4.1. This means that the allowReachableSHA1InWant configuration can be used (have not tested it though). https://gerrit-review.googlesource.com/#/c/72259/ updates the download plugin to read that configuration as well. |
|
| ► Sign in to add a comment |