Issue 3424: Cannot read project after diff timeout
Status:  Duplicate
Merged:  issue 3361
Owner: ----
Closed:  Oct 9
Cc:
Project Member Reported by zaro0508, Jun 12, 2015
A diff timeout causes an entire project to be unusable.  Gerrit responds with cannot read project.

This bug was first reported in the mailing list:
https://groups.google.com/d/msg/repo-discuss/ZeGWPyyJlrM/pY-hk2lZWMQJ

This was an _ATTEMPT_ to fix this issue but it did not fix it:
https://gerrit-review.googlesource.com/#/c/68149/

Affected Version: 2.10.5

What steps will reproduce the problem?
1. upload a change with a small one line diff to a test project.  I'm using 'gtest-org/test2' for my testing. 
2. upload a change with a big diff (hundreds of large files with many line changes) to the same project. Something similar to this https://review-dev.openstack.org/#/c/5354 which has ~700 files, ~2000 lines of diff.
3. set the diff timeout to an artificially small value in gerrit.config: 
  [cache "diff"]
       timeout = 5ms   
4. restart Gerrit
Note: may need to flush-caches [1] here.
5. using the gerrit UI attempted to view some changes containing small (one line) diffs in foo project.
Note: no problem here.
6. Now attempt to view the change uploaded in step 2.  Look in the gerrit log to make sure the timeout was hit.  When diff timeout happens the log [2] indicates 'Error computing PatchListKey'
7. Now I attempt step 5 again (line 355 in log).  Changes in project 'gtest-org/test2' are no longer accessible which mean i can no longer view any other changes in that project.

 [1] https://gerrit-review.googlesource.com/Documentation/cmd-flush-caches.html
 [2] http://paste.openstack.org/show/242703/


What is the expected output? What do you see instead?
  Diff timeout should not cause an entire project to be unusable.  


Jun 16, 2015
Project Member #1 dougk....@gmail.com
I had a similar issue where reindex could time out and result in the same type of backtrace (though it's been a while and I probably have misplaced my notes on this issue by now): I hit the same issue (pack is corrupt, removing from pack list), and subsequent changes on the same project would fail.  Previously, I could work around the issue by simply running git repack in the repository and reindexing offline or manually indexing each change.  My specific case *was* resolved in 2.10.5, though I do wonder if there are perhaps multiple things at play in this case...
Jun 26, 2015
Project Member #3 zaro0508
@doug, It might be that you have a different use case that resulted in the same error.  I can confirm that the specific use case in this issue is still reproducible on Gerrit 2.10.5.
Jun 26, 2015
Project Member #4 zaro0508
I just wanted to add that we host our repositories right on the same server that's running Gerrit, it's in the review_site/git folder.  We replicate to a few different locations from there (http://git.openstack.org and  https://github.com/openstack).
Jul 7, 2015
#5 sschuberth
We also see this happening with the default cache.diff.timeout value of 5s (not 5ms) in several real-world projects. Given that the outcome is an unreadable project, shouldn't we give this a higher priority than "Minor"?
Jul 8, 2015
Project Member #6 zaro0508
(No comment was entered for this change.)
Labels: -Priority-Minor Priority-Major
Jul 20, 2015
Project Member #8 bassem.rabil
This timeout [1]  has been introduced to avoid getting into infinite loop with calculating MyersDiff for the patchset. This timeout can be tuned at a higher value e.g. 5 minutes to ensure quiting MyersDiff Calculations only for those really large changes.

This has been fixed in JGit 4.0.1 used in Gerrit 2.11.2. For more details check the discussion [2] for explanation of this behaviour. 

You can workaround this bug in earlier Gerrit releases by repacking the repository using git repack if the project is not visible anymore, e.g.
$ git repack -a -d --window-memory 100m --max-pack-size 200m
This should bring back the visibility of the project without having to restart the Gerrit instance.


[1] https://gerrit-review.googlesource.com/Documentation/config-gerrit.html#cache
[2] https://groups.google.com/forum/#!searchin/repo-discuss/diff$20timeout/repo-discuss/CYYoHfDxCfA/hRiYBonEsLYJ
Jul 21, 2015
Project Member #9 bassem.rabil
The change for this is done now, i.e. https://gerrit-review.googlesource.com/#/c/68978/
Status: Fixed
Aug 4, 2015
Project Member #10 zaro0508
I have verified that change 68978 does not fix this issue.  I cherry-picked that change onto the stable-2.10 branch to test it and I continue to reproduce this issue.  Here is the log with of my test: http://paste.openstack.org/show/406893/

Status: Accepted
Aug 4, 2015
Project Member #11 zaro0508
@Bassem, I just assumed that change 68978 can be back ported to stable-2.10 branch please let me know if there's something else in 2.11 that helps to fix this issue.  Also could you please clarify what you meant in your last comment about setting the diff timeout to 5m?   I'm wondering what 'quitting MyersDiff' means?  Does that mean that the timeout will be completely turned off? if so wouldn't that cause the infinite loop again? 

I guess we can configure the timeout to a larger value but didn't want some future (large) change to cause this error again which would leave our repro(s) in an unusable state.

Aug 4, 2015
Project Member #12 bassem.rabil
Before this timeout was introduced, there was almost infinite loop to calculate the diffs for patchsets using this MyersDiff algorithm. The web UI was protected against this infinite loop [1], but this was not handled during checking mergeability check during reindex process [2]. There was a discussion on this topic [3], I am not sure if the latest patch [4] is included in Gerrit 2.10-x releases. Setting the timeout to larger values would be safer to ensure that the diff will be quited only for larger changes, and instead it will use a simplified diff for such large change.



[1] https://code.google.com/p/gerrit/issues/detail?id=487
[2] https://groups.google.com/forum/#!searchin/repo-discuss/MyersDiff/repo-discuss/ZtiCilM3wFA/LijfZ4YkLHsJ
[3] https://groups.google.com/forum/#!searchin/repo-discuss/MyersDiff/repo-discuss/CYYoHfDxCfA/hRiYBonEsLYJ
[4] https://git.eclipse.org/r/#/c/48288/

Aug 4, 2015
Project Member #13 zaro0508
As stated in this issue's description, the first _ATTEMPT_ to fix this issue was to pickup the jgit change https://git.eclipse.org/r/#/c/48288/ by merging Gerrit change  https://gerrit-review.googlesource.com/#/c/68149  Therefore yes, 48288 was merged into the stable-2.10 branch a while ago. 

Actually the proposed workaround (increase the timeout) is not really acceptable for us because our repos are in the open and that workaround would still expose us to possible a DoS attack.

Aug 4, 2015
Project Member #14 zaro0508
(No comment was entered for this change.)
Cc: sasa.ziv...@sap.com
Aug 18, 2015
#15 glennch...@gmail.com
this issue also occurs in Gerrit 2.10.6.
Aug 25, 2015
#16 sven.sel...@sonymobile.com
The issue still occurs in gerrit 2.11.1 - vanilla
jgit: 4.0.0.201506090130-r

Slightly different stacktrace:
http://pastebin.com/jGVrxTmU
Oct 8, 2015
Project Member #17 zaro0508
Fix in review: https://gerrit-review.googlesource.com/#/c/71341

Status: ChangeUnderReview
Oct 9, 2015
Project Member #18 David.Os...@gmail.com
(No comment was entered for this change.)
Status: Duplicate
Mergedinto: 3361