
chromium-os - issue #5943
ath9k chip becomes slow and kernel outputs 0x806c / 0xdeadbeef messages
Chrome OS Version : 0.8.64.0 Type of computer : Atheros-based netbook
The relevant messages are repeated pairs of:
ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef
Older versions of ChromeOS (as far back as June, see Issue 4177)
These messages are visible as recently as the latest stable compat-wireless (compat-wireless-2.6.36-rc1-1.tar.bz2)
Comment #1
Posted on Aug 20, 2010 by Happy HorseThese errors do not appear related to suspend/resume. They seem more correlated with low-signal conditions.
Comment #2
Posted on Aug 20, 2010 by Happy Horse(No comment was entered for this change.)
Comment #3
Posted on Aug 21, 2010 by Quick HippoDon't think low signal is related; I've seen this with a strong signal (iw dev wlan0 link reported -51).
Comment #4
Posted on Aug 30, 2010 by Quick HippoSeeing this on a new dogfood machine running TOT (compat wireless 2.6.36-rc2):
2010-08-30T16:12:36.179882-07:00 localhost kernel: [ 205.704077] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:12:42.176006-07:00 localhost kernel: [ 211.701119] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:12:47.179950-07:00 localhost kernel: [ 216.704086] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:12:50.562892-07:00 localhost kernel: [ 220.087255] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-30T16:12:50.562931-07:00 localhost kernel: [ 220.087400] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-30T16:12:53.327002-07:00 localhost kernel: [ 222.851182] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-30T16:12:53.327047-07:00 localhost kernel: [ 222.851301] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-30T16:13:05.176077-07:00 localhost kernel: [ 234.701087] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:13:06.557030-07:00 localhost kernel: [ 236.081743] phy0: release an RX reorder frame due to timeout on earlier frames 2010-08-30T16:13:06.557161-07:00 localhost kernel: [ 236.081761] phy0: release an RX reorder frame due to timeout on earlier frames 2010-08-30T16:13:06.557195-07:00 localhost kernel: [ 236.081773] phy0: release an RX reorder frame due to timeout on earlier frames 2010-08-30T16:13:08.093888-07:00 localhost kernel: [ 237.618678] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-30T16:13:08.093930-07:00 localhost kernel: [ 237.618810] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-30T16:13:10.310990-07:00 localhost kernel: [ 239.835886] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-30T16:13:10.311031-07:00 localhost kernel: [ 239.836003] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-30T16:13:14.176938-07:00 localhost kernel: [ 243.701196] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:13:17.179942-07:00 localhost kernel: [ 246.704117] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:13:24.176914-07:00 localhost kernel: [ 253.701173] wlan0: detected beacon loss from AP - sending probe request
Comment #5
Posted on Aug 30, 2010 by Quick Hippo(No comment was entered for this change.)
Comment #6
Posted on Sep 1, 2010 by Happy Elephant(No comment was entered for this change.)
Comment #7
Posted on Sep 1, 2010 by Happy Horse(No comment was entered for this change.)
Comment #8
Posted on Sep 1, 2010 by Happy Horse(No comment was entered for this change.)
Comment #9
Posted on Sep 2, 2010 by Grumpy RabbitPlease try this patch, we have been unable to reproduce easily but it seems you can. Please let us know if this fixes the issue observed in this bug report so we can submit upstream as a stable fix.
Comment #10
Posted on Sep 2, 2010 by Quick GiraffeI am not able to reproduce this issue. can you please try this patch while we try to reproduce this issue in parallel. Do you have any recommendations on reproducing this issue. We tried data traffic test and also suspend/resume, but in vain.
Comment #11
Posted on Sep 3, 2010 by Grumpy RabbitOK -- anyone who was able to reproduce this, please test this new patch instead.
- fix-ps.patch 1.12KB
Comment #12
Posted on Sep 3, 2010 by Happy Panda(No comment was entered for this change.)
Comment #13
Posted on Sep 4, 2010 by Grumpy RabbitOK here is a new retake of this, after some more review. It just enhances it further. I think this is ready for submission upstream. Please test and let us know. If we get confirmation this fixes things we'll send it upstream.
Comment #14
Posted on Sep 4, 2010 by Massive MonkeyBefore applying any of these patches, I saw quite frequent deadbeef messages:
71289542-2010-09-03T15:55:18.102033-07:00 localhost kernel: [18143.098593] ath: AR_IMR 0x918414b0 IER 0x1 71289639-2010-09-03T15:55:18.102076-07:00 localhost kernel: [18143.098602] ath: AWAKE -> NETWORK SLEEP 71289733-2010-09-03T15:55:18.102114-07:00 localhost kernel: [18143.098777] ath: disable IER 71289816-2010-09-03T15:55:18.102176-07:00 localhost kernel: [18143.098810] ath: new IMR 0x918414b0 71289906-2010-09-03T15:55:18.102215-07:00 localhost kernel: [18143.098821] ath: enable IER 71289988:2010-09-03T15:55:18.102256-07:00 localhost kernel: [18143.098833] ath: AR_IMR 0xdeadbeef IER 0xdeadbeef 71290092:2010-09-03T15:55:18.247078-07:00 localhost kernel: [18143.246977] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 71290236:2010-09-03T15:55:18.247169-07:00 localhost kernel: [18143.247038] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 71290353-2010-09-03T15:55:20.998242-07:00 localhost kernel: [18145.997159] ath: NETWORK SLEEP -> AWAKE 71290447-2010-09-03T15:55:20.998295-07:00 localhost kernel: [18145.997241] ath: 0xf4041071 => 0xf4041071 71290543-2010-09-03T15:55:20.998321-07:00 localhost kernel: [18145.997252] ath: disable IER 71290626-2010-09-03T15:55:20.998345-07:00 localhost kernel: [18145.997264] ath: new IMR 0x918414b0 71290716-2010-09-03T15:55:20.998371-07:00 localhost kernel: [18145.997274] ath: enable IER
These would appear during normal use, but I could not correlate them to any particular use case.
For both "wake-rxabort.patch" & "fix-ps.patch" I could no longer reproduce these messages.
"ps_enable_after_rxabort.patch" does not seem to apply to the same base version that I am using.
Comment #15
Posted on Sep 4, 2010 by Grumpy RabbitThanks for the feedback. Please give this patch attached in this comment a shot, this should be the final one, yesterday I attached an older patch, this is the right one.
To be clear you can ignore all posted patches. The complete list of patches you can apply on top of compat-wireless-2.6.36-rcx are;
http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2010/09/PS-fixes-09-04/
These should fix the this issue, issue 5709, and issue 5715.
Comment #16
Posted on Sep 7, 2010 by Happy GiraffeCommit: c7f084083f35106a80ce3767b7020318f60e4bb1 Email: pstew@chromium.org
CHROMEOS: ath9k: fix power save race conditions
ath9k has a race on putting the chip into network sleep and having registers read from hardware. The race occurs because although ath9k_ps_restore() locks its own callers it makes use of some variables which get altered in the driver at different code paths. The variables are the ps_enabled and ps_flags.
This is easily reprodicible in large network environments when roaming with the wpa_supplicant simple bgscan. You'd get some 0xdeadbeef read out on certain registers such as:
ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef
ath: timeout (100000 us) on reg 0x7000: 0xdeadbeef & 0x00000003 != 0x00000000 ath: Chip reset failed
The fix is to protect the ath9k_config(hw, IEEE80211_CONF_CHANGE_PS) calls with a spin_lock_irqsave() which will disable contendors for these variables from interrupt context, timers, re-entry from mac80211 on the same callback, and most importantly from ath9k_ps_restore() which is the only call which will put the device into network sleep.
There are quite a few threads and bug reports on these a few of them are:
https://bugs.launchpad.net/ubuntu/karmic/+source/linux/+bug/407040 http://code.google.com/p/chromium-os/issues/detail?id=5709 http://code.google.com/p/chromium-os/issues/detail?id=5943
Cc: stable@kernel.org [2.6.32+] Signed-off-by: Luis R. Rodriguez
[To be replaced by cherry-pick]
BUG=chromium-os:5943, chromium-os:5709 TEST=Manual
Review URL: http://codereview.chromium.org/3367013
M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/main.c M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/recv.c
Comment #17
Posted on Sep 7, 2010 by Happy GiraffeCommit: 16729c246ccdf3b970b0377be7076fb7c40b3456 Email: pstew@chromium.org
CHROMEOS: ath9k: fix power save race conditions
ath9k has a race on putting the chip into network sleep and having registers read from hardware. The race occurs because although ath9k_ps_restore() locks its own callers it makes use of some variables which get altered in the driver at different code paths. The variables are the ps_enabled and ps_flags.
This is easily reprodicible in large network environments when roaming with the wpa_supplicant simple bgscan. You'd get some 0xdeadbeef read out on certain registers such as:
ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef
ath: timeout (100000 us) on reg 0x7000: 0xdeadbeef & 0x00000003 != 0x00000000 ath: Chip reset failed
The fix is to protect the ath9k_config(hw, IEEE80211_CONF_CHANGE_PS) calls with a spin_lock_irqsave() which will disable contendors for these variables from interrupt context, timers, re-entry from mac80211 on the same callback, and most importantly from ath9k_ps_restore() which is the only call which will put the device into network sleep.
There are quite a few threads and bug reports on these a few of them are:
https://bugs.launchpad.net/ubuntu/karmic/+source/linux/+bug/407040 http://code.google.com/p/chromium-os/issues/detail?id=5709 http://code.google.com/p/chromium-os/issues/detail?id=5943
Cc: stable@kernel.org [2.6.32+] Signed-off-by: Luis R. Rodriguez
[To be replaced by cherry-pick]
BUG=chromium-os:5943, chromium-os:5709 TEST=Manual
Review URL: http://codereview.chromium.org/3367013 (cherry picked from commit c7f084083f35106a80ce3767b7020318f60e4bb1)
M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/main.c M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/recv.c
Comment #18
Posted on Sep 7, 2010 by Grumpy ElephantComment deleted
Comment #19
Posted on Sep 7, 2010 by Massive MonkeyI see no more "0xdeadbeef" in /var/log/messages after applying these (3) patches:
http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2010/09/PS-fixes-09-04/
Comment #20
Posted on Sep 21, 2010 by Happy Panda(No comment was entered for this change.)
Comment #21
Posted on Sep 21, 2010 by Happy Pandapstew says this is fixed
Comment #22
Posted on Oct 4, 2010 by Helpful Lion(No comment was entered for this change.)
Comment #23
Posted on Oct 15, 2010 by Helpful LionVerified fix in 0.9.84.0 with 'grep -H deadbeef /var/log/messages*'. None found.
Comment #24
Posted on Oct 15, 2010 by Helpful Lion(No comment was entered for this change.)
Comment #25
Posted on Mar 7, 2013 by Grumpy Hippo(No comment was entered for this change.)
Comment #26
Posted on Mar 10, 2013 by Quick Rabbit(No comment was entered for this change.)
Comment #27
Posted on Mar 12, 2013 by Happy HorseMoved to: Issue chromium:187883
Status: Moved
Labels:
Type-Bug
Pri-0
Iteration-11
Iteration-12
Iteration-13
OS-Chrome
Cr-OS-Systems-Network
M-9