My favorites | Sign in
Project Home Downloads Wiki Issues Code Search
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 201243: Buildbot commit log is inaccurate compared to what we build.
1 person starred this issue and may be notified of changes. Back to list
 
Project Member Reported by sosa@chromium.org, Jul 25, 2011
Our bots sync using the http mirror while commits show up on the buildbot using git_buildbot which is called from the actual git repo.  Since the http mirror is often behind by a decent amount (~5 minutes) the view of history on the buildbot waterfall can be very misleading for sheriffs.

Investigate looking at pushing commits from the http mirror rather than git.

+cc scottz / maruel for ideas on how to overcome this.
Jul 25, 2011
#1 maruel@chromium.org
I don't think it's a good idea to fetch from the http mirror though.
Cc: cmp@chromium.org nsylv...@chromium.org
Jul 25, 2011
#2 scottz@chromium.org
Before we moved to gerrit we used git over ssh for everything. Was there a reason to default to http or was it just how the manifest was set up?
Jul 25, 2011
#3 nsylv...@google.com
The http mirrors should never be more than 15 seconds delayed.  Is this 5 minute delay reproducible?
Jul 25, 2011
#4 nsylv...@google.com
As for why we use http: 

Gerrit is a java app that has a built-in JGit used to serve the data.  It takes a lot of resources to run and cannot be load balanced, so if we had all our bots fetch from gerrit directly at the same time this would bring gerrit down.
Jul 25, 2011
#5 nsylv...@chromium.org
Brad, is it possible that only the buildbot git poller is delayed and not the http mirror itself? How often does the poller run and update its checkout?
Cc: bradnelson@google.com
Jul 25, 2011
#6 scottz@chromium.org
Actually now that I reread this thread, git_poll_sync uses the exact same manifest as the buildbots. So we are not notified of them until the commits hit the mirror. This runs on 5 minute intervals as well so any commit that we see and notify buildbot of is actually on the mirror already. What investigation was done to confirm that we are out of sync here? At a glance this should not be the case. 

--
On the note of ssh killing gerrit, what is our maximum there? It is a little surprising that having less than 100 connections syncing from gerrit at the same time will kill the server.

Jul 25, 2011
#7 awesomesos
Just visual using the buildbot.  I had forgotten about the poll interval and so blame the http mirror.  If you see the log here:  http://chromegw/i/chromiumos/builders/x86%20generic%20pre%20flight%20queue/builds/6577/steps/LKGMCandidateSync/logs/stdio (today) we synced TOT on vboot_firmware @ 8:56am and got 4bc713d0df70117a6459fb1ac0ca248eef774c66.  However, if you look at hte buildbot waterfall, commit 4bc713d0df70117a6459fb1ac0ca248eef774c66 doesn't show up until 9am.  So the commit is showing up as part of a build that started before it visually got committed.  
Jul 25, 2011
#8 sosa@chromium.org
Sorry that was me ^^^
Jul 25, 2011
#9 scottz@chromium.org
Ahahah awesomesos :) 

So you are saying that the 5 minute delay is a problem? Won't this just go away once we have a commit queue. I don't know how else we could solve this unless we just had this constantly polling the servers and even then we might not get a quick enough turnaround because polling constantly means running repo sync so it isn't parallel for all repos.

If the 5 minute delay is a long term problem we have to solve (despite commit queue etc) the only real option I see is adding it to hooks for the http mirrors and I am not sure if that is possible. 
Jul 28, 2011
#10 sosa@chromium.org
cmasone and I talked about this with some of Chrome-infra.  Seems like a good approach would be to move the manifest-generating logic from the PFQ to the poller.  The poller would generate a manifest based on git changes and the PFQ would just move to listening to commits into the manifest repo.  We can then dynamically generate a blamelist based on changes the poller got.
Jul 28, 2011
#11 sosa@chromium.org
(No comment was entered for this change.)
Cc: cmasone@chromium.org
Aug 1, 2011
#12 scottz@chromium.org
I actually like this idea a lot, I think in general this will help us avoid some of the other issues we are seeing, i.e. builders getting out of sync. I would like to evaluate the poller a bit more considering the role it will be moving in to but on the surface this sounds like a good direction to go. 
Aug 8, 2011
#13 sosa@chromium.org
(No comment was entered for this change.)
Status: Started
Aug 9, 2011
#14 bugdroid1@chromium.org
Commit: 000ea2f257c6385f0eaa903d273ec441a9f8ff12
 Email: sosa@chromium.org

Print out links to all CL's since the last passing build.

BUG=chromium-os:18140
TEST=New unittest plus ran on latest build.

Change-Id: I5b971bfdba4d4e94a7fbf4b61e7fe2a19c674611
Reviewed-on: http://gerrit.chromium.org/gerrit/5528
Reviewed-by: Chris Sosa <sosa@chromium.org>
Tested-by: Chris Sosa <sosa@chromium.org>

M	buildbot/cbuildbot_stages.py
M	buildbot/lkgm_manager.py
M	buildbot/lkgm_manager_unittest.py
Aug 9, 2011
#15 bugdroid1@chromium.org
Commit: 1e031c316b07091ad3f5e4083d74aa7a6dc0b4bb
 Email: sosa@chromium.org

Add :change_number to create uniqueness in link name for blamelist links.

It seems like the annotator doesn't like links with the same name and only
prints out the last link with the same name.  This uniqueifies the name
by using a combination of author name and change number.

BUG=chromium-os:18140
TEST=unittests

Change-Id: I25fbf8c8ee95023009c050c9fe2f241709a17329
Reviewed-on: http://gerrit.chromium.org/gerrit/5583
Tested-by: Chris Sosa <sosa@chromium.org>
Reviewed-by: Ryan Cui <rcui@chromium.org>

M	buildbot/lkgm_manager.py
M	buildbot/lkgm_manager_unittest.py
Aug 9, 2011
#16 sosa@chromium.org
Stop gap has been committed that prints out a list of Gerrit CL's since the last LKGM as part of LKGMSyncStage.  
Aug 9, 2011
#17 sosa@chromium.org
(No comment was entered for this change.)
Status: Available
Sep 7, 2011
#18 sosa@chromium.org
 Issue 16489  has been merged into this issue.
Sep 7, 2011
#19 sosa@chromium.org
 Issue 10452  has been merged into this issue.
Cc: an...@chromium.org eblake@chromium.org
Sep 7, 2011
#20 sosa@chromium.org
(No comment was entered for this change.)
Cc: d...@chromium.org mtennant@chromium.org
Sep 26, 2011
#21 bdavi...@chromium.org
bulk edit: punt to R16
Labels: Mstone-R16
Oct 17, 2011
#22 bugdroid1@chromium.org
Commit: db27ad61221232be6d48bc8e21c210d83da894a4
 Email: sosa@chromium.org

Do not publish or print out LKGM manifest / blamelist.

BUG=chromium-os:18140
TEST=unittests + pyflakes

Change-Id: I9470358f40880d5e43c8a89613d6afefdc1bf285
Reviewed-on: http://gerrit.chromium.org/gerrit/10198
Tested-by: Chris Sosa <sosa@chromium.org>
Reviewed-by: Ryan Cui <rcui@chromium.org>

M	buildbot/cbuildbot_stages.py
M	buildbot/lkgm_manager.py
Oct 17, 2011
#23 bugdroid1@chromium.org
Commit: c6d87cc9278f05c1edbdc9abebda24c9159ce0ef
 Email: sosa@chromium.org

Do not publish or print out LKGM manifest / blamelist.

BUG=chromium-os:18140
TEST=unittests + pyflakes

Change-Id: I9470358f40880d5e43c8a89613d6afefdc1bf285
Reviewed-on: http://gerrit.chromium.org/gerrit/10198
Tested-by: Chris Sosa <sosa@chromium.org>
Reviewed-by: Ryan Cui <rcui@chromium.org>
Reviewed-on: http://gerrit.chromium.org/gerrit/10201
Reviewed-by: Chris Sosa <sosa@chromium.org>

M	buildbot/cbuildbot_stages.py
M	buildbot/lkgm_manager.py
Nov 3, 2011
#24 bdavi...@chromium.org
bulk edit: punt to R17
Labels: Mstone-R17
Dec 12, 2011
#25 or...@chromium.org
Moving non-essential bugs to R18. please move back if this was done in error and your bug is a blocker for R17.
Labels: -Mstone-R17 bulkmove Mstone-R18
Mar 1, 2012
#26 dd...@chromium.org
Bulk move of non-blocking issues from R18 to R19.
Labels: -Mstone-R18 Mstone-R19
Mar 27, 2012
#27 davidjames@google.com
Bulk edit: This didn't make it into R19, but would be a good candidate for R20.
Labels: -Mstone-R19 Mstone-R20
Apr 6, 2012
#28 dd...@chromium.org
(No comment was entered for this change.)
Labels: -Mstone-R20 Mstone-20
May 14, 2012
#29 sosa@chromium.org
(No comment was entered for this change.)
Labels: -Mstone-20 Mstone-21
Jul 16, 2012
#30 dd...@chromium.org
Bulk move of non-blocking issues from Mstone-21 to Mstone-22.
Labels: -Mstone-21 Mstone-22
Sep 4, 2012
#31 dd...@chromium.org
Bulk moving non-blocking issues from Mstone-22 to Mstone-23.
Labels: -Mstone-22 Mstone-23
Oct 23, 2012
#32 benhe...@google.com
(No comment was entered for this change.)
Labels: -Mstone-23 mstone-24 bulkedit
Oct 29, 2012
#33 sosa@chromium.org
For the work that is done here, I'm marking as fixed. If we end up deciding to completely fix this we should file a new bug.
Status: Fixed
Oct 29, 2012
#34 patri...@chromium.org
(No comment was entered for this change.)
Status: Verified
Oct 29, 2012
#35 chromeos...@chromium.org
(No comment was entered for this change.)
Labels: FixedIn-1190.0.0 FixedInIndex-28
Mar 6, 2013
#36 lafo...@google.com
(No comment was entered for this change.)
Labels: OS-Chrome
Mar 9, 2013
#37 bugdroid1@chromium.org
(No comment was entered for this change.)
Labels: -Area-Build -mstone-24 M-24 Build
Sign in to add a comment

Powered by Google Project Hosting