My favorites | Sign in
Project Home Downloads Wiki Issues Code Search
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 403981: WebGL conformance tests totally fail to launch on Mac 10.8 gpu bots from time to time
2 people starred this issue and may be notified of changes. Back to list
 
Project Member Reported by zmo@chromium.org, Aug 14, 2014
One example: 

http://build.chromium.org/p/chromium.gpu/builders/Mac%2010.8%20Release%20%28Intel%29/builds/26597

[ RUN      ] WebglConformance.conformance_attribs_gl_enable_vertex_attrib
WARNING:root:Could not find Flash at ../third_party/adobe/flash/binaries/ppapi/mac/PepperFlashPlayer.plugin. Continuing without Flash.
To run with Flash, check it out via http://go/read-src-internal
INFO:root:Requested remote debugging port: 0
INFO:root:Discovered ephemeral port 49319
INFO:root:Discovered ephemeral port 49319
INFO:root:OS: mac mountainlion
INFO:root:Model: Macmini
INFO:root:GPU device 0: VENDOR = 0x8086 (Intel), DEVICE = 0x166
INFO:root:GPU Attributes:
INFO:root:  adapter_luid        : 0.0
INFO:root:  amd_switchable      : False
INFO:root:  can_lose_context    : False
INFO:root:  direct_rendering    : True
INFO:root:  driver_date         : 
INFO:root:  driver_vendor       : 
INFO:root:  driver_version      : 
INFO:root:  finalized           : False
INFO:root:  gl_extensions       : 
INFO:root:  gl_renderer         : 
INFO:root:  gl_reset_notification_strategy: 0
INFO:root:  gl_vendor           : 
INFO:root:  gl_version          : 
INFO:root:  gl_ws_extensions    : 
INFO:root:  gl_ws_vendor        : 
INFO:root:  gl_ws_version       : 
INFO:root:  initialization_time : 0.001285
INFO:root:  lenovo_dcute        : False
INFO:root:  optimus             : False
INFO:root:  pixel_shader_version: 
INFO:root:  process_crash_count : 0
INFO:root:  sandboxed           : True
INFO:root:  software_rendering  : False
INFO:root:  vertex_shader_version: 
INFO:root:Feature Status:
INFO:root:  2d_canvas           : enabled
INFO:root:  flash_3d            : enabled
INFO:root:  flash_stage3d       : enabled
INFO:root:  flash_stage3d_baseline: enabled
INFO:root:  gpu_compositing     : enabled
INFO:root:  rasterization       : unavailable_software
INFO:root:  threaded_rasterization: disabled_off_ok
INFO:root:  video_decode        : unavailable_software
INFO:root:  video_encode        : enabled
INFO:root:  webgl               : enabled
INFO:root:Driver Bug Workarounds:
INFO:root:  clear_alpha_in_readpixels
INFO:root:  clear_uniforms_before_first_program_use
INFO:root:  disable_arb_sync
INFO:root:  disable_multimonitor_multisampling
INFO:root:  init_varyings_without_static_use
INFO:root:  max_cube_map_texture_size_limit_1024
INFO:root:  max_texture_size_limit_4096
INFO:root:  reverse_point_sprite_coord_origin
INFO:root:  set_texture_filter_before_generating_mipmap
INFO:root:  unfold_short_circuit_as_ternary_operation
INFO:root:  unroll_for_loop_with_sampler_array_index
INFO:root:  validate_multisample_buffer_allocation
DVFreeThread - CFMachPortCreateWithPort hack = 0x7c69dab0, fPowerNotifyPort= 0x7c69cbe0
DVFreeThread - CFMachPortCreateWithPort hack = 0x7c6a0390, fPowerNotifyPort= 0x7c69eff0
ERROR:root:Aborting after too many retries
[       OK ] WebglConformance.conformance_attribs_gl_enable_vertex_attrib (25038 ms)
WARNING:root:No crash dump found. Returning browser stdout.
Can't get standard output with --show-stdout

Traceback (most recent call last):
  <module> at content/test/gpu/run_gpu_test.py:19
    sys.exit(test_runner.main())
  main at tools/telemetry/telemetry/test_runner.py:347
    return command().Run(options)
  Run at tools/telemetry/telemetry/test_runner.py:182
    return min(255, self._test().Run(args))
  Run at tools/telemetry/telemetry/benchmark.py:94
    page_runner.Run(pt, ps, expectations, finder_options, results)
  Run at tools/telemetry/telemetry/page/page_runner.py:411
    page, credentials_path, possible_browser, results, state)
  _PrepareAndRunPage at tools/telemetry/telemetry/page/page_runner.py:257
    finder_options)
  StartBrowserIfNeeded at tools/telemetry/telemetry/page/page_runner.py:99
    if len(self.browser.tabs) == 0:
  __len__ at tools/telemetry/telemetry/core/tab_list.py:15
    return self._tab_list_backend.__len__()
  __len__ at tools/telemetry/telemetry/core/backends/chrome/inspector_backend_list.py:67
    self._Update()
  _Update at tools/telemetry/telemetry/core/backends/chrome/inspector_backend_list.py:71
    contexts = self._browser_backend.ListInspectableContexts()
  ListInspectableContexts at tools/telemetry/telemetry/core/backends/chrome/chrome_browser_backend.py:195
    return json.loads(self.Request(''))
  Request at tools/telemetry/telemetry/core/backends/chrome/chrome_browser_backend.py:211
    raise exceptions.BrowserConnectionGoneException(self.browser, e)
BrowserConnectionGoneException: timed out
Stack Trace:
********************************************************************************
	
********************************************************************************

Locals:
  e                       : timeout('timed out',)
  opener                  : <urllib2.OpenerDirector instance at 0x1018bc128>
  path                    : ''
  proxy_handler           : <urllib2.ProxyHandler instance at 0x1018bc3f8>
  throw_network_exception : False
  timeout                 : 5
  url                     : 'http://127.0.0.1:49319/json'
Aug 14, 2014
#1 zmo@chromium.org
Tony, any changes you landed yesterday or today might cause this?
Cc: tonyg@chromium.org
Aug 14, 2014
#3 tonyg@chromium.org
Are you able to repro anywhere?
Aug 14, 2014
#4 zmo@chromium.org
I took the mac 10.8 ATI bot offline and can reproduce it 1 out of 5 runs:

http://build.chromium.org/p/chromium.gpu/waterfall?builder=Mac%2010.8%20Release%20(ATI)

With both isolates or a ToT local build
Aug 14, 2014
#5 tonyg@chromium.org
Oh, nice -- these changes would be the most likely:
$ git log tools/telemetry/telemetry/core/backends/chrome/.

You could try speculatively reverting each on that bot or bisecting. If that doesn't turn anything up, lmk and I can log into the bot to try to diagnose too.
Aug 15, 2014
#6 kbr@chromium.org
Investigating.

Able to reproduce this on build70-a1 (Mac 10.8 Release (ATI)). It looks like Telemetry isn't handling well the first launch of the browser. If this takes too much time then Telemetry reaches a maximum number of retries.

It's tedious to get stdout from the remote machine but here's roughly what's reported:

% ./content/test/gpu/run_gpu_test.py webgl_conformance --browser=release
[ RUN     ] WebglConformance.conformance_attribs_gl_enable_vertex_attrib
ERROR:root:aborting after too many retries
[       OK ] WebglConformance.conformance_attribs_gl_enable_vertex_attrib (25028 ms)
WARNING:root:No crash dump found. Returning browser stdout.

The entire test run aborts at that point.


Status: Assigned
Owner: kbr@chromium.org
Cc: enne@chromium.org
Labels: -Pri-2 Pri-1 Cr-Tests-Telemetry
Aug 15, 2014
#7 kbr@chromium.org
It seems clear that there's a new race in Telemetry in handling browser startup. The tests launch reliably on my fast desktop but not on this slow test bot.

Aug 15, 2014
#8 kbr@chromium.org
Here's the stack trace from the failure.

$ ./content/test/gpu/run_gpu_test.py webgl_conformance --browser=release
[ RUN      ] WebglConformance.conformance_attribs_gl_enable_vertex_attrib
ERROR:root:Aborting after too many retries
[       OK ] WebglConformance.conformance_attribs_gl_enable_vertex_attrib (23971 ms)
WARNING:root:No crash dump found. Returning browser stdout.

Traceback (most recent call last):
  <module> at content/test/gpu/run_gpu_test.py:19
    sys.exit(test_runner.main())
  main at tools/telemetry/telemetry/test_runner.py:347
    return command().Run(options)
  Run at tools/telemetry/telemetry/test_runner.py:182
    return min(255, self._test().Run(args))
  Run at tools/telemetry/telemetry/benchmark.py:94
    page_runner.Run(pt, ps, expectations, finder_options, results)
  Run at tools/telemetry/telemetry/page/page_runner.py:411
    page, credentials_path, possible_browser, results, state)
  _PrepareAndRunPage at tools/telemetry/telemetry/page/page_runner.py:257
    finder_options)
  StartBrowserIfNeeded at tools/telemetry/telemetry/page/page_runner.py:99
    if len(self.browser.tabs) == 0:
  __len__ at tools/telemetry/telemetry/core/tab_list.py:15
    return self._tab_list_backend.__len__()
  __len__ at tools/telemetry/telemetry/core/backends/chrome/inspector_backend_list.py:67
    self._Update()
  _Update at tools/telemetry/telemetry/core/backends/chrome/inspector_backend_list.py:71
    contexts = self._browser_backend.ListInspectableContexts()
  ListInspectableContexts at tools/telemetry/telemetry/core/backends/chrome/chrome_browser_backend.py:195
    return json.loads(self.Request(''))
  Request at tools/telemetry/telemetry/core/backends/chrome/chrome_browser_backend.py:211
    raise exceptions.BrowserConnectionGoneException(self.browser, e)
BrowserConnectionGoneException: timed out
Stack Trace:
********************************************************************************
	
********************************************************************************

Locals:
  e                       : timeout('timed out',)
  opener                  : <urllib2.OpenerDirector instance at 0x101ca21b8>
  path                    : ''
  proxy_handler           : <urllib2.ProxyHandler instance at 0x101ca2b48>
  throw_network_exception : False
  timeout                 : 5
  url                     : 'http://127.0.0.1:50624/json'

Aug 15, 2014
#9 kbr@chromium.org
There were several changes under  Issue 357059  affecting timeouts throughout Telemetry. Reverting just r288533 seems to fix this problem, but I'm hesitant to simply revert it because it's clearly fixed flakiness on other Chromium bots. Looking into Telemetry's browser launching to make it more robust.

Blockedon: chromium:357059
Aug 15, 2014
#10 kbr@chromium.org
Note also that I'm unable to reproduce the above failure reliably on the target machine, and not able to reproduce it at all on my desktop workstation, even after inducing CPU load by running dozens of copies of "yes > /dev/null".

Aug 15, 2014
#11 kbr@chromium.org
When the failure happens, _WaitForBrowserToComeUp in chrome_browser_backend.py indicates that the browser's started successfully. Then the later WebSocket request in ListInspectableContexts times out.

Aug 15, 2014
#12 kbr@chromium.org
Added code to dump all threads' stack traces at the point of failure. There's only one thread running in Telemetry at that point; the main thread. There's no contention between different threads' WebSocket connections or anything similar.

Aug 15, 2014
#13 kbr@chromium.org
https://codereview.chromium.org/474413002/ being CQ'd.

The second request to the recently-launched browser pauses until the first renderer completes launching, which can take considerable time. The 5 second timeout is too low for slow bots.

Aug 15, 2014
#14 kbr@chromium.org
(No comment was entered for this change.)
Status: Started
Aug 16, 2014
#15 bugdro...@chromium.org
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/69dcd0bd69fa360af12d318838f166d8b422aeb1

commit 69dcd0bd69fa360af12d318838f166d8b422aeb1
Author: kbr@chromium.org <kbr@chromium.org@0039d316-1c4b-4281-b951-d872f2087c98>
Date: Sat Aug 16 08:22:58 2014

Increase default DevTools request timeout to 30 seconds.

The timeout was recently changed from None (wait forever) to 5 seconds.
On slow bots, even after _WaitForBrowserToComeUp returns, the launch of
the first renderer can take considerable time. The next request for the
browser's state is subject to the timeout here, and the browser does not
respond until the renderer's launch completes.

Increasing the timeout to 30 seconds should not regress the original fix
and should fix the flakiness.

BUG=403981
TBR=tonyg@chromium.org

Review URL: https://codereview.chromium.org/474413002

Cr-Commit-Position: refs/heads/master@{#290133}
git-svn-id: svn://svn.chromium.org/chrome/trunk/src@290133 0039d316-1c4b-4281-b951-d872f2087c98


Aug 16, 2014
#16 bugdro...@chromium.org
------------------------------------------------------------------
r290133 | kbr@chromium.org | 2014-08-16T08:22:58.618070Z

Changed paths:
   M http://src.chromium.org/viewvc/chrome/trunk/src/tools/telemetry/telemetry/core/backends/chrome/chrome_browser_backend.py?r1=290133&r2=290132&pathrev=290133

Increase default DevTools request timeout to 30 seconds.

The timeout was recently changed from None (wait forever) to 5 seconds.
On slow bots, even after _WaitForBrowserToComeUp returns, the launch of
the first renderer can take considerable time. The next request for the
browser's state is subject to the timeout here, and the browser does not
respond until the renderer's launch completes.

Increasing the timeout to 30 seconds should not regress the original fix
and should fix the flakiness.

BUG=403981
TBR=tonyg@chromium.org

Review URL: https://codereview.chromium.org/474413002
-----------------------------------------------------------------
Aug 17, 2014
#17 kbr@chromium.org
 Issue 404254  has been merged into this issue.
Cc: zmo@chromium.org
Aug 18, 2014
#18 kbr@chromium.org
This particular problem is fixed.

Status: Fixed
Sign in to add a comment

Powered by Google Project Hosting