| Issue 202105: | "Timed out waiting to revert DNS" - while running few hwqual autotests | |
| 4 people starred this issue and may be notified of changes. | Back to list |
Sign in to add a comment
|
Google Chrome 15.0.848.0 (Official Build 96126) Platform 877.0 (Official Build) dev-channel test WebKit 535.1 (@92693) JavaScript V8 3.5.2 Flash 10.3.200.201 See issue 4923 (Partner Tracker), Steps: == Run control.auto Wait for desktopui_V8Bench to auto run in que. Result: "Timed out waiting for login"/"TimeoutError: Timed out waiting to revert DNS" @ desktopui_V8Bench Full logs: http://cros-hwqual-5.mtv.corp.google.com/bugs/logs/results.auto.tar.gz @ desktopui_SunSpiderBench Full logs:http://cros-hwqual-5.mtv.corp.google.com/bugs/logs/results.sunspider_final.tar.gz Status message: = START desktopui_V8Bench desktopui_V8Bench timestamp=1313006625 localtime=Aug 10 13:03:45 ERROR desktopui_V8Bench desktopui_V8Bench timestamp=1313006682 localtime=Aug 10 13:04:42 Timed out waiting for login END ERROR desktopui_V8Bench desktopui_V8Bench timestamp=1313006682 localtime=Aug 10 13:04:42 debug/*.Warning:- == 08/10 13:03:47 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 08/10 13:03:47 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 08/10 13:03:53 ERROR|cros_ui_te:0202| Cryptohome could not remove the test user. 08/10 13:04:41 ERROR| test:0430| Ignoring exception during cleanup() phase: 08/10 13:04:41 ERROR| traceback:0013| Traceback (most recent call last): 08/10 13:04:41 ERROR| traceback:0013| File "/usr/local/autotest/common_lib/test.py", line 428, in _exec 08/10 13:04:41 ERROR| traceback:0013| _cherry_pick_call(self.cleanup, *args, **dargs) 08/10 13:04:41 ERROR| traceback:0013| File "/usr/local/autotest/common_lib/test.py", line 535, in _cherry_pick_call 08/10 13:04:41 ERROR| traceback:0013| return func(*p_args, **p_dargs) 08/10 13:04:41 ERROR| traceback:0013| File "/usr/local/autotest/tests/desktopui_V8Bench/desktopui_V8Bench.py", line 30, in cleanup 08/10 13:04:41 ERROR| traceback:0013| super(desktopui_V8Bench, self).cleanup() 08/10 13:04:41 ERROR| traceback:0013| File "/usr/local/autotest/cros/cros_ui_test.py", line 412, in cleanup 08/10 13:04:41 ERROR| traceback:0013| self.stop_authserver() 08/10 13:04:41 ERROR| traceback:0013| File "/usr/local/autotest/cros/cros_ui_test.py", line 350, in stop_authserver 08/10 13:04:41 ERROR| traceback:0013| self.revert_dns() 08/10 13:04:41 ERROR| traceback:0013| File "/usr/local/autotest/cros/cros_ui_test.py", line 139, in revert_dns 08/10 13:04:41 ERROR| traceback:0013| timeout=10) 08/10 13:04:41 ERROR| traceback:0013| File "/usr/local/autotest/bin/site_utils.py", line 65, in poll_for_condition 08/10 13:04:41 ERROR| traceback:0013| raise exception 08/10 13:04:41 ERROR| traceback:0013| TimeoutError: Timed out waiting to revert DNS. 08/10 13:04:41 ERROR| test:0433| Now raising the earlier <class 'autotest_lib.client.cros.login.TimeoutError'> error Please re-assign the areas if necessary
Aug 10, 2011
#1
cmasone@chromium.org
Summary:
ARM: "Timed out waiting to revert DNS" - while running few hwqual autotests
Aug 10, 2011
Sending back to Bryan, who investigated the last issue that affected this test.
Status:
Assigned
Owner: bfreed@chromium.org Labels: -Area-Test Area-Platform Mstone-R15 Iteration-36 Team-Kernel
Aug 17, 2011
Seen again today in this build: http://build.chromium.org/p/chromiumos/builders/x86%20pineview%20full/builds/145/steps/Test/logs/stdio ========== ERROR FILE /tmp/cbuildbotYnHcnt/test_harness/SimpleTestVerify/1_verify/suite_Smoke/login_CryptohomeMounted/debug/login_CryptohomeMounted.ERROR FOR TEST suite_Smoke/login_CryptohomeMounted ============== 08/17 16:27:11 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 08/17 16:27:11 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 08/17 16:28:03 ERROR|site_utils:0065| Timed out waiting to revert DNS.
Labels:
TreeCloser
Aug 22, 2011
Bulk move to 37.
Labels:
Iteration-37
Aug 29, 2011
Just saw this error on x86-alex-binary so does not appear to be ARM specific. Different test from either of the above two this time http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-alex-binary/builds/1106/steps/VMTest/logs/stdio ========== ERROR FILE /tmp/cbuildbotbdCC9X/test_harness/testUpdateWipeStateful/4_verify/suite_Smoke/login_LoginSuccess.default/debug/login_LoginSuccess.default.ERROR FOR TEST suite_Smoke/login_LoginSuccess.default ============== 08/29 07:39:17 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 08/29 07:39:17 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 08/29 07:39:20 ERROR|base_packa:0461| packages.checksum could not be fetched from any of the repos ['autoserv://'] 08/29 07:39:41 ERROR|site_utils:0065| Timed out waiting to revert DNS.
Summary:
"Timed out waiting to revert DNS" - while running few hwqual autotests
Labels: -Arch-ARM Arch-All
Aug 29, 2011
(No comment was entered for this change.)
Owner:
msb@chromium.org
Aug 29, 2011
The fix is a little more complicated than I thought. Re-assigning to a cros_ui_test.py guru.
08:09 <+msb__> kliegs: I think I see the bug. revert_dns calls __attempt_resolve
with expected=False meaning we want to verify that www.google.com
does not resolve to 127.0.01. However, the DNS lookup fails so we
return None instead of True. For the expected=False case we
shouldn't care whether or not the DNS got resolved. All the
matters is that it wasn't resolved to 127.0.0.1
08:09 <+kliegs> msb__: hmm - so this is an actual bug then?
08:09 <+msb__> relying on the actual DNS to work in the lab is a recipe for
flakiness
08:10 <+kliegs> msb__: Or is it a case where the test could be more robust?
08:10 <+msb__> kliegs: test could be more robust
08:10 <+kliegs> msb__: Ahh _ i gotcha. The test is checking the www.google.com
resolves to a real, non-127.0.0.1 address. But occasionally the
VM's fail to resolve external DNS's so it gets back a null result
08:11 <+kliegs> Which should pass the test (www.google.com doesn't resolve to
loopback), but the test doesn't handle that case properly?
08:11 <+msb__> exactly, all that matters is that its not 127.0.0.1
08:11 <+kliegs> ok. so the tree should still be reopened as flakiness, but the
test should also be fixed to allow null as a valid result
08:12 <+kliegs> should crosbug.com/19005 be reassigned then?
08:12 <+msb__> I think so. I'd probably modify __attempt_resolve to return True
for the expected=False case if host = None
08:14 <+msb__> sure, you can re-assign it to me and I'll send a CL to cmasone
08:16 <+kliegs> msb__: poll_for_condition is just a loop? where it keeps
checking the first test until the timeout?
08:16 <+kliegs> i'm not too familiar with autotest. Wanted to make sure it
wasn't a single DNS lookup that timed out.
09:02 <+msb__> kliegs: yeah, the poll is problematic
Owner:
cmasone@chromium.org
Aug 29, 2011
We time out each resolution after a second, and try over and over again for at least 30 seconds of walltime. Given this, your diagnosis doesn't wash. Ping would have to not be able to resolve addresses thirty straight times. That's not transient, if that's the failure mode you're seeing. The chances are that you're seeing the failure we've seen several times before, where we start a ping attempt and then the machine clock jumps many seconds forward in time before the ping returns. Nothing in any log files at all for many seconds. Nothing in /var/log/messages, chrome logs, session_manager logs, nothing anywhere. We suspect this is some issue of the VM not getting to run for some period of time, but don't really know why this would happen. But the ping command itself has been run with flags that should make it bail after a second, and instead it (and the whole VM) block for far longer than that. The reason that I try to revert DNS at all and don't just ignore the issue is that tests (that I didn't write) which do not fake out DNS seem to require real network connections for whatever reason, so failing to re-acquire the ability to hit the net causes issues. Also, developers and QA people using test images complain mightily if the tests don't return machines to a state in which they can be used for manual testing. I am happy to take out this workaround that I have put in place so that other people's non-hermetic tests can run, and developers can use test images at their desks. Just say the word :-)
Status:
Available
Owner: ---
Sep 14, 2011
http://chromeos-botmaster.mtv.corp.google.com:8026/builders/stumpy-binary/builds/708/steps/VMTest/logs/stdio Seen again: ========== ERROR FILE /tmp/cbuildbotnWSkp0/test_harness/testUpdateWipeStateful/2_verify/suite_Smoke/desktopui_WindowManagerFocusNewWindows/debug/desktopui_WindowManagerFocusNewWindows.ERROR FOR TEST suite_Smoke/desktopui_WindowManagerFocusNewWindows ============== 09/14 00:15:04 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 09/14 00:15:04 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 09/14 00:15:37 ERROR|site_utils:0065| Timed out waiting to revert DNS. =========== END ERROR FILE /tmp/cbuildbotnWSkp0/test_harness/testUpdateWipeStateful/2_verify/suite_Smoke/desktopui_WindowManagerFocusNewWindows/debug/desktopui_WindowManagerFocusNewWindows.ERROR FOR TEST suite_Smoke/desktopui_WindowManagerFocusNewWindows ===========
Sep 14, 2011
Recently this issue happens very frequently. http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-alex_he%20canary/builds/175 http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-mario%20canary/builds/1001 http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-zgb%20canary/builds/789 http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-zgb_he%20canary/builds/97
Sep 16, 2011
Another instance of this one: http://build.chromium.org/p/chromiumos/builders/x86%20generic%20PFQ/builds/887/steps/VMTest/logs/stdio
Sep 16, 2011
Another: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-mario%20canary/builds/1014/steps/VMTest/logs/stdio
Sep 19, 2011
http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-alex-binary/builds/1412/steps/VMTest/logs/stdio ========== ERROR FILE /tmp/cbuildbottH8DvM/test_harness/testUpdateWipeStateful/2_verify/suite_Smoke/desktopui_ScreenLocker/debug/desktopui_ScreenLocker.ERROR FOR TEST suite_Smoke/desktopui_ScreenLocker ============== 09/19 01:26:31 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 09/19 01:26:31 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 09/19 01:27:10 ERROR|site_utils:0065| Timed out waiting to revert DNS.
Sep 19, 2011
http://chromegw.corp.google.com/i/chromiumos/builders/x86%20pineview%20full/builds/390/steps/VMTest/logs/stdio
Sep 19, 2011
Seen on stumpy-binary: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/stumpy-binary/builds/759/steps/VMTest/logs/stdio Assigning to cmasone, who seems to have written the "Timed out waiting to revert DNS." code.
Status:
Assigned
Owner: cmasone@chromium.org
Sep 19, 2011
One more: http://build.chromium.org/p/chromiumos/builders/x86%20generic%20full/builds/398/steps/VMTest/logs/stdio
Sep 19, 2011
Another occurrence: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-zgb%20canary/builds/813/steps/VMTest/logs/stdio
Sep 20, 2011
Trying a new tactic this time.
Status:
Started
Sep 20, 2011
Once more, as if the data is needed at this point: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/x86-alex_he%20canary/builds/204/steps/VMTest/logs/stdio
Sep 20, 2011
And: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/stumpy%20canary/builds/348/steps/VMTest/logs/stdio
Sep 20, 2011
(No comment was entered for this change.)
Labels:
-Pri-1 -Area-Platform -Team-Kernel Pri-0 Area-Test Team-Systems
Sep 20, 2011
Again: http://chromegw/i/chromiumos/builders/x86%20generic%20full/builds/411/steps/VMTest/logs/stdio
Sep 21, 2011
Commit: 3a402eed830cf7f34e147b6a77b3af3e69f8a0aa Email: cmasone@chromium.org CHROMIUM: config: enable nat Add nat module to kernel config. This is needed so that tests can reliably force the system to resolve DNS using a local, mock, DNS server. BUG=chromium-os:19005 TEST=emerge-{x86,arm}-generic chromeos-kernel, suite_Smoke on x86 VM Change-Id: I1c5a1d1b2789e0787cc5134893047e500316b260 Signed-off-by: Chris Masone <cmasone@chromium.org> Reviewed-on: http://gerrit.chromium.org/gerrit/8013 Reviewed-by: Olof Johansson <olofj@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> M chromeos/config/config.common.chromeos
Sep 21, 2011
(No comment was entered for this change.)
Labels:
Iteration-39
Sep 21, 2011
Commit: 781676a171bc1165143ee3a4fe50383e44db7a23 Email: cmasone@chromium.org [autotest] Force all addresses to localhost in VM-tests We've been having flaky timeouts forever while trying to use flimflam to tell everyone to resolve DNS using a locally-running server during tests, and then reset to the normal DNS servers at the end of a test run. Try switching to using iptables to force the mapping at a low level. BUG=chromium-os:19005 TEST=suite_Smoke in a VM, control.bvt on a real device. Change-Id: Ib3f9260071929bb33067055cccd5d99c37ffd3aa Reviewed-on: http://gerrit.chromium.org/gerrit/8020 Tested-by: Chris Masone <cmasone@chromium.org> Reviewed-by: Will Drewry <wad@chromium.org> Commit-Ready: Chris Masone <cmasone@chromium.org> Reviewed-by: Brian Harring <ferringb@chromium.org> Tested-by: Will Drewry <wad@chromium.org> Commit-Ready: Will Drewry <wad@chromium.org> M client/cros/cros_ui_test.py
Sep 21, 2011
We have reason to believe this is addressed in ToT. Leaving open to see if the issue repros again
Status:
Fixed
Sep 21, 2011
cmasone says this TOT reproduced the issue: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/TOT%20Pre-Flight%20Queue/builds/5669/
Sep 21, 2011
Sigh. Going back to the ping-based mechanism of resolving DNS. At least that lets us force multiple checks within the timeout window.
Status:
Started
Sep 21, 2011
Commit: ea4b4c1b4a048490ab78628ec78bd60959edc57a Email: cmasone@chromium.org [autotest] Go back to resolving DNS addresses with ping Going back to the ping-based mechanism of resolving DNS. At least that lets us force multiple checks within the timeout window, unlike the more-suave python code :-/ BUG=chromium-os:19005 TEST=suite_Smoke in VM, bvt on device Change-Id: Iabf516b800aad9bdab3a5ef46b184ab0319e2bed Reviewed-on: http://gerrit.chromium.org/gerrit/8102 Commit-Ready: Chris Masone <cmasone@chromium.org> Tested-by: Chris Masone <cmasone@chromium.org> Reviewed-by: Gaurav Shah <gauravsh@chromium.org> M client/cros/cros_ui_test.py
Sep 22, 2011
Plz 2 be fixed nao? Will watch for a couple days to see how it goes...
Status:
Fixed
Sep 24, 2011
UNoCanHazFixeded builders/x86-zgb_he%20canary/builds/142/steps/VMTest/logs/stdio ========== ERROR FILE /tmp/cbuildbotTF9ppd/test_harness/testUpdateWipeStateful/2_verify/suite_Smoke/desktopui_WindowManagerFocusNewWindows/debug/desktopui_WindowManagerFocusNewWindows.ERROR FOR TEST suite_Smoke/desktopui_WindowManagerFocusNewWindows ============== 09/24 13:06:54 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 09/24 13:06:54 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 09/24 13:07:43 ERROR|site_utils:0065| Timed out waiting to revert DNS.
Status:
Assigned
Sep 24, 2011
And non -he /builders/x86-zgb%20canary/builds/836/steps/VMTest/logs/stdio ========== ERROR FILE /tmp/cbuildbot1Q0A8C/test_harness/testUpdateWipeStateful/4_verify/suite_Smoke/security_ProfilePermissions.login/debug/security_ProfilePermissions.login.ERROR FOR TEST suite_Smoke/security_ProfilePermissions.login ============== 09/24 13:35:45 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 09/24 13:35:45 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 09/24 13:36:18 ERROR|site_utils:0065| Timed out waiting to revert DNS.
Sep 24, 2011
Thanks for rubbing it in
Sep 24, 2011
The failure from x86-zgb_he is that failure where, despite telling ping to stop after one second and utils.system to time ping out after two seconds, everything hangs for ten seconds. The other was an error with revoking iptables rules: http://gerrit.chromium.org/gerrit/#change,8260
Sep 24, 2011
Commit: 64a54c5c098374fc6e361f17999cb8bc53dc6ebe Email: cmasone@chromium.org [autotest] Ensure we delete iptables rules If resetting flimflam's list of interfaces to check for captive portals on throws, we won't delete the iptables rule we installed at the start of the test. Juggle things around and add a try/finally to make sure that we always delete the rules. BUG=chromium-os:19005 TEST=suite_Smoke Change-Id: I53602842ede78d3a189077985b2617db854450b8 Reviewed-on: http://gerrit.chromium.org/gerrit/8260 Tested-by: Chris Masone <cmasone@chromium.org> Reviewed-by: Peter Mayo <petermayo@chromium.org> Commit-Ready: Chris Masone <cmasone@chromium.org> M client/cros/cros_ui_test.py
Sep 25, 2011
Commit: 13ae40988bbdc6b25248daee2cf3ed9ffa0b6237 Email: cmasone@chromium.org [autotest] Ensure that we only attempt to remove iptables rules that we added BUG=chromium-os:19005 TEST=50x suite_Smoke Change-Id: I5510bc79fb2357d420e678363e7fdf6f29254823 Reviewed-on: http://gerrit.chromium.org/gerrit/8261 Commit-Ready: Chris Masone <cmasone@chromium.org> Reviewed-by: Chris Masone <cmasone@chromium.org> Tested-by: Chris Masone <cmasone@chromium.org> M client/cros/cros_ui_test.py
Sep 27, 2011
Another case http://chromegw.corp.google.com/i/chromiumos/builders/x86%20pineview%20full/builds/454/steps/VMTest/logs/stdio ========== ERROR FILE /tmp/cbuildbotE_uVWz/test_harness/SimpleTestVerify/1_verify/suite_Smoke/desktopui_UrlFetch.not-live/debug/desktopui_UrlFetch.not-live.ERROR FOR TEST suite_Smoke/desktopui_UrlFetch.not-live ============== 09/27 03:25:37 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 09/27 03:25:37 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 09/27 03:27:36 ERROR|site_utils:0065| Timed out waiting to revert DNS.
Sep 27, 2011
Another case: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/lumpy%20canary/builds/93/steps/VMTest/logs/stdio ========== ERROR FILE /tmp/cbuildbotLl2bgb/test_harness/testUpdateKeepStateful/2_verify/suite_Smoke/login_CryptohomeMounted/debug/login_CryptohomeMounted.ERROR FOR TEST suite_Smoke/login_CryptohomeMounted ============== 09/27 13:05:35 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 09/27 13:05:35 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 09/27 13:06:37 ERROR|site_utils:0065| Timed out waiting to revert DNS. ... 09/27 13:07:06 ERROR| traceback:0013| TimeoutError: Timed out waiting to revert DNS.
Sep 27, 2011
Commit: b90ad2f566745b94c08f6b16d28e28d096c3ccca Email: cmasone@chromium.org [autotest] strace ping while trying to reset DNS BUG=chromium-os:19005 TEST=suite_Smoke Change-Id: Id3e2910b9bc98a89601b142d1006e743360b625d Reviewed-on: http://gerrit.chromium.org/gerrit/8372 Commit-Ready: Chris Masone <cmasone@chromium.org> Tested-by: Chris Masone <cmasone@chromium.org> Reviewed-by: Paul Stewart <pstew@chromium.org> Tested-by: Paul Stewart <pstew@chromium.org> Commit-Ready: Paul Stewart <pstew@chromium.org> M client/cros/cros_ui_test.py
Sep 28, 2011
another likely instance of this issue from today: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/lumpy-binary/builds/247/steps/VMTest/logs/stdio ========== ERROR FILE /tmp/cbuildbot6nLN_1/test_harness/testUpdateKeepStateful/2_verify/suite_Smoke/login_BadAuthentication/debug/login_BadAuthentication.ERROR FOR TEST suite_Smoke/login_BadAuthentication ============== 09/28 09:49:11 ERROR| warnings:0029| /usr/local/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters 09/28 09:49:11 ERROR| warnings:0029| super(Connection, self).__init__(*args, **kwargs) 09/28 09:49:35 ERROR|site_utils:0065| Timed out waiting to revert DNS.
Sep 28, 2011
I'm an idiot and forgot to make strace timestamp the ping output, so this log doesn't have much useful data in it. Dammit. Patch incoming...
Sep 28, 2011
another instance (but still doesn't have much useful log data): http://chromeos-botmaster.mtv.corp.google.com:8026/builders/stumpy-binary/builds/844/steps/VMTest/logs/stdio
Sep 28, 2011
Commit: f66e5063e9104dcc955e77502b9cf41831961e76 Email: cmasone@chromium.org [autotest] add timestamps to ping strace Also, make sure that we don't clobber strace output, and instead keep the output for each ping command we run. BUG=chromium-os:19005 TEST=suite_Smoke Change-Id: I241c69ff2edfb2234992d8b0251a16327de35237 Reviewed-on: http://gerrit.chromium.org/gerrit/8426 Commit-Ready: Chris Masone <cmasone@chromium.org> Reviewed-by: Chris Masone <cmasone@chromium.org> Tested-by: Chris Masone <cmasone@chromium.org> Reviewed-by: Paul Stewart <pstew@chromium.org> M client/cros/cros_ui_test.py
Sep 28, 2011
another instance: http://chromeos-botmaster.mtv.corp.google.com:8026/builders/stumpy-binary/builds/850/steps/VMTest/logs/stdio 22:40:11 INFO | START desktopui_WindowManagerFocusNewWindows desktopui_WindowManagerFocusNewWindows timestamp=1317274811 localtime=Sep 28 22:40:11 22:40:11 INFO | Bundling /build/stumpy/usr/local/autotest/client/site_tests/desktopui_WindowManagerFocusNewWindows into test-desktopui_WindowManagerFocusNewWindows.tar.bz2 22:40:44 INFO | ERROR desktopui_WindowManagerFocusNewWindows desktopui_WindowManagerFocusNewWindows timestamp=1317274845 localtime=Sep 28 22:40:45 Timed out waiting to revert DNS. 22:40:44 INFO | END ERROR desktopui_WindowManagerFocusNewWindows desktopui_WindowManagerFocusNewWindows timestamp=1317274845 localtime=Sep 28 22:40:45 ping strace attached.
Sep 28, 2011
22:40:33 gettimeofday({1317274833, 351092}, NULL) = 0 <0.000005>
22:40:33 sendmsg(19, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("74.125.224.144")}, msg_iov(1)=[{"\10\0+>\23\17\0\1\321\4\204Nt[\5\0\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27"..., 64}], msg_controllen=0, msg_flags=0}, 0) = 64 <0.000042>
22:40:33 recvmsg(19, 0x7f9f4928, 0) = -1 EAGAIN (Resource temporarily unavailable) <9.972700>
22:40:43 --- SIGALRM (Alarm clock) @ 0 (0) ---
22:40:43 sigreturn() = ? (mask now []) <0.000019>
22:40:43 write(1, "PING www.l.google.com (74.125.22"..., 62) = 62 <0.000015>
22:40:43 write(1, "--- www.l.google.com ping statis"..., 104) = 104 <0.000618>
22:40:43 exit_group(1) = ?
That's what we get...we get back a recvmsg with EAGAIN...and then nothing happens til someone SIGALRMs us 10s later o_O
Cc:
pstew@chromium.org sleffler@chromium.org
Sep 29, 2011
Doesn't the "<9.972700>" at the end of the line mean that it took 9.9 seconds for the EAGAIN to be returned? As such, doesn't that mean that the SIGALARM 10 second timeout was what lead to the EAGAIN? This particular case is one of a well-formed ping failure. SIGALARM is likely the normal (self induced) timeout mechanism within ping.
Sep 29, 2011
Doesn't the "<9.972700>" at the end of the line mean that it took 9.9 seconds for the EAGAIN to be returned? As such, doesn't that mean that the SIGALARM 10 second timeout was what lead to the EAGAIN? This particular case is one of a well-formed ping failure. SIGALARM is likely the normal (self induced) timeout mechanism within ping.
Sep 29, 2011
You're right that it look 9.9s to wait for EAGAIN, but I pass -w 1 to ping, which is supposed to be "a timeout, in seconds, before ping exits regardless of how many packets have been sent or received."
It sounds like either I'm not understanding the semantics, or it's a bug in ping. Also, I told autotest to sigkill ping after 2s, and that's not working either. I could also pass -W 1, the "Time to wait for a response, in seconds. The option affects only timeout in absense of
any responses,"
Sep 29, 2011
I've taken this apart a bit more carefully, and done a couple tests myself. I'm confusing return codes. If indeed you were interrupted by the SIGALARM (due to your -w option), you should have received an EINTR (interrupted system call) as the result code for the recvmsg. Instead, what you got was an EAGAIN, which is as a result of the socket hitting the receive timeout (which is set to the ping interval, again, 1 second). This means that neither the socket timeout nor the setitimer were honored in any where near the contracted time, and at some point both were delivered simultaneously. This definitely points at the VM (or at least this process) being completely suspended for 10 seconds, then both unwedging at once. You may be able to tell from the outside that this happened by how long this process took. If you trust that ping would have done the right thing had it the chance, and it still took longer (and failed), you know you ran into a non-ping-related failure. The only caution I'd give you is that since you're using a symbolic name (www.google.com) and ping uses gethostbyname directly with no timeout (the DNS library has a timeout -- just that you can't configure it) DNS unreachability failures might look like the same problem unless you trawl stderr.
Sep 29, 2011
So, what you're saying is that this could either be a DNS resolution issue, or the VM getting wedged? That's kinda what we already thought :-)
Maybe we're not getting stderr in the DEBUG log? Here's what I have:
09/28 22:40:33 DEBUG|base_utils:0074| Running 'strace -t -T -o /usr/local/autotest/results/default/desktopui_WindowManagerFocusNewWindows/results/ping_trace-1317274833.3 ping -c 1 -w 1 -q www.google.com.'
09/28 22:40:43 WARNI|base_utils:0812| run process timeout (2) fired on: strace -t -T -o /usr/local/autotest/results/default/desktopui_WindowManagerFocusNewWindows/results/ping_trace-1317274833.3 ping -c 1 -w 1 -q www.google.com.
09/28 22:40:43 WARNI|cros_ui_te:0099| Command <strace -t -T -o /usr/local/autotest/results/default/desktopui_WindowManagerFocusNewWindows/results/ping_trace-1317274833.3 ping -c 1 -w 1 -q www.google.com.> failed, rc=1, Command(s) did not complete within 2 seconds
* Command:
strace -t -T -o /usr/local/autotest/results/default/desktopui_WindowManag
erFocusNewWindows/results/ping_trace-1317274833.3 ping -c 1 -w 1 -q
www.google.com.
Exit status: 1
Duration: 10.044095993
stdout:
PING www.l.google.com (74.125.224.144) 56(84) bytes of data.
--- www.l.google.com ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
Usually anything that went to stderr gets dumped out automatically when one uses utils.system. I'm using system_output, so maybe it's getting lost? I can force it to print to the logs as well...
Sep 29, 2011
I'm saying that failures that involve the ping process lasting significantly longer than it should will fall in one of two buckets: (1) where you never see ping statistics because DNS failed, only a message to stderr claiming "unknown host" (says so whether it gets a negative response from the DNS server or if it times out with no response) (2) where you do see ping statistics with 100% loss -- here, you know that it was setting receive timers yet had them run far afield -- suspect VM interference, and perhaps consider this ping run invalid in telling you anything about your connectivity.
Sep 29, 2011
I'm not using it to tell me anything about connectivity; I just want to force a timeoutable DNS resolution attempt. Apparently this is an impossible thing :-)
Sep 29, 2011
-w 1 sets the timer from 10s to 1s. We can see it being armed in the strace:
22:40:33 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={1, 0}}, NULL) = 0 <0.000007>
but it doesn't fire at 40:34 or similar so this must be a vm issue
Sep 29, 2011
Commit: a3c158e9fac14c943a6221f3a0054288004ee1b2 Email: cmasone@chromium.org [autotest] Log more ping output to understand 19005 BUG=chromium-os:19005 TEST=suite_Smoke Change-Id: I7bc41ff6520dca5208fd1d05944d1a95f22c06f9 Reviewed-on: http://gerrit.chromium.org/gerrit/8488 Commit-Ready: Chris Masone <cmasone@chromium.org> Reviewed-by: Chris Masone <cmasone@chromium.org> Tested-by: Chris Masone <cmasone@chromium.org> M client/cros/cros_ui_test.py
Sep 29, 2011
Another instance: http://build.chromium.org/p/chromiumos/builders/x86%20generic%20full/builds/484 I hope the tgz attached has your retained output, I don't see anything new in the stdio.
Sep 29, 2011
This failure (described in comment 58) is the result of a race between the completion of the ping and the and the python timeout. From the strace logs, it looks like everything went well, apart from the fact that you shouldn't always actually expect to be able to ping www.google.com. The first ICMP receive timed out, and raced with the SIGALARM for the next ping attempt. The ping exited cleanly, and printed everything out on stdout. The only problem was that due to the DNS at the beginning and the SIGALARM race at the end, it ended up taking ever so slightly more than 2 seconds, so the python code threw away the output. Can I suggest you completely change strategy. Do something like: utils.system_output("python -c 'import socket; print socket.gethostbyname("www.google.com")'") [need to work on quote escaping] something that just does the DNS you want, none of the ping you don't? I know that's calling python from within python (I know you need a subprocess so you end up with a fresh DNS cache), but it just seems more onerous to walk the minefield of ping's error recovery. Granted, there's a penalty. "Time" on my alex says: ping: .064s/.001s/.006s vs python: .102s/.069s/.016s (real/user/sys) but that might be a reasonable cost for your sanity. I can write you a DNS lookup utility as well, if you want one in C. I just think that continuing to dig down the path of making ping behave will need to tears, especially in the light of what you're actually trying to accomplish here. (Also especially since none of the traces I've seen have actually errored out on the DNS part of this process).
Sep 30, 2011
Commit: 89b5782ae1db57d81f7ecc2ec3e69c52f9de5058 Email: pstew@chromium.org [autotest] Stop the ping madness Use a python subprocess instead of running ping. Ping does too many other things other than look up things in DNS and can get hung up in ways that aren't of any interest to us. BUG=chromium-os:19005 TEST=cros_vm_test with suite_Smoke Change-Id: I62fcf18d59a79f2bb065b64975da543bbf11c507 Reviewed-on: http://gerrit.chromium.org/gerrit/8541 Commit-Ready: Paul Stewart <pstew@chromium.org> Reviewed-by: Paul Stewart <pstew@chromium.org> Tested-by: Paul Stewart <pstew@chromium.org> M client/cros/cros_ui_test.py
Oct 3, 2011
(No comment was entered for this change.)
Labels:
Iteration-40
Oct 5, 2011
I'm hopeful once again! it's been 5 days :-) Let's wait til early next week and if it still hasn't repro'd...mark it verified?
Status:
Fixed
Oct 6, 2011
(No comment was entered for this change.)
Status:
Verified
Oct 19, 2011
I see this issue again.
Reproducer:
./run_remote_tests.sh --build --remote=localhost --ssh_port 9222 --board=${BOARD} --use_emerged desktopui_V8Bench
The error message I see:
INFO : Test results:
-----------------------------------------------
desktopui_V8Bench/desktopui_V8Bench [ FAILED ]
desktopui_V8Bench/desktopui_V8Bench ERROR: Timed out waiting to revert DNS.
desktopui_V8Bench/desktopui_V8Bench 10/19 14:51:51 ERROR|site_utils:0078| Timed out waiting to revert DNS.
desktopui_V8Bench/desktopui_V8Bench score_crypto 14071
desktopui_V8Bench/desktopui_V8Bench score_deltablue 14070
desktopui_V8Bench/desktopui_V8Bench score_earleyboyer 19364
desktopui_V8Bench/desktopui_V8Bench score_raytrace 10856
desktopui_V8Bench/desktopui_V8Bench score_regexp 1845
desktopui_V8Bench/desktopui_V8Bench score_richards 9302
desktopui_V8Bench/desktopui_V8Bench score_splay 2575
desktopui_V8Bench/desktopui_V8Bench score_total 7851
Status:
Available
Oct 19, 2011
that's not useful. Please attach all your logs, give version information, platform information etc.
Status:
Started
Oct 19, 2011
$ echo $BOARD x86-generic Chrome OS image version (built on my machine): R16-1198.0.2011_10_19_1257-a1 I run Chrome OS inside kvm, using the following command: ./bin/cros_start_vm --no_graphics --image_path=../../chroot//home/krasin/trunk/src/build/images/x86-generic/latest/chromiumos_qemu_image.bin The image was produced with: ./build_packages --board=${BOARD} ./build_image --board=${BOARD} ./image_to_vm.sh --board=${BOARD} --test_image Please, let me know if I'm missing something.
Oct 19, 2011
It seems like we may fail to clean up in some error case? work tracked in crosbug.com/21871 Re-closing.
Status:
Verified
Oct 24, 2011
Repro'ed: Full logs: http://cros-hwqual-5.mtv.corp.google.com/bugs/logs/results.desktopV8.tar.gz 18:09:16 INFO | ERROR desktopui_V8Bench desktopui_V8Bench timestamp=1319245755 localtime=Oct 21 18:09:15 Timed out waiting to revert DNS. 18:09:16 INFO | END ERROR desktopui_V8Bench desktopui_V8Bench timestamp=1319245755 localtime=Oct 21 18:09:15
Status:
Assigned
Oct 24, 2011
Please go look at crosbug.com/21871 Given how much change there has been in the implementation here, re-opening this issue is counterproductive. None of this context is useful to debugging future problems.
Status:
Verified
Oct 28, 2011
Claiming work to iteration-41
Labels:
Iteration-41
Nov 8, 2011
(No comment was entered for this change.)
Labels:
FixedIn-1285.0.0
Jan 9, 2012
(No comment was entered for this change.)
Labels:
FixedIn-1284.7.0
Jan 20, 2012
(No comment was entered for this change.)
Labels:
FixedInIndex-2b FixedInIndex-504_1
Jun 5, 2012
(No comment was entered for this change.)
Labels:
-FixedIn-1285.0.0 -FixedInIndex-2b FixedIn-1948.0.0 FixedInIndex-41
Sep 8, 2012
Commit: 86e78a507a540d9d0ecee7a4379c471223e3c904 Email: cmasone@chromium.org CHROMIUM: config: enable nat Add nat module to kernel config. This is needed so that tests can reliably force the system to resolve DNS using a local, mock, DNS server. BUG=chromium-os:19005 TEST=emerge-{x86,arm}-generic chromeos-kernel, suite_Smoke on x86 VM Change-Id: I1c5a1d1b2789e0787cc5134893047e500316b260 Signed-off-by: Chris Masone <cmasone@chromium.org> Reviewed-on: http://gerrit.chromium.org/gerrit/8013 Reviewed-by: Olof Johansson <olofj@chromium.org> Reviewed-by: Grant Grundler <grundler@chromium.org> Reviewed-on: http://gerrit.chromium.org/gerrit/8798 Tested-by: Olof Johansson <olofj@chromium.org> M chromeos/config/config.common.chromeos
Mar 6, 2013
(No comment was entered for this change.)
Labels:
OS-Chrome
Mar 9, 2013
(No comment was entered for this change.)
Labels:
-Area-Test -Mstone-R15 -Team-Systems -TreeCloser Cr-Test Hotlist-TreeCloser M-15 Cr-OS-Systems
|
||||||||||||
| ► Sign in to add a comment | |||||||||||||