Export to GitHub

chromium-os - issue #5709

ath9k chip becomes unresponsive after resume from suspend


Posted on Aug 14, 2010 by Happy Horse

Chrome OS Version : 0.8.64.0 Type of computer : Multiple atheros-based machines

The relevant error messages after resume are:

[ 2735.539413] ath: timeout (100000 us) on reg 0x7000: 0xdeadbeef & 0x00000003 != 0x00000000 [ 2735.539434] ath: Chip reset failed [ 2735.539442] ath: Unable to reset hardware; reset status -22 (freq 2412 MHz)

This problem isn't specific to chromeos. I've found a couple threads that reference errors like this:

https://bugs.launchpad.net/ubuntu/karmic/+source/linux/+bug/407040 http://groups.google.com.cu/group/fa.linux.kernel/browse_thread/thread/7e458b0a9c60e3f6/49325e6a5f23c406?lnk=raot

It is likely that this problem is happening more prevalently due to the use of background scan.

Comment #1

Posted on Aug 14, 2010 by Massive Elephant

(No comment was entered for this change.)

Comment #2

Posted on Aug 14, 2010 by Quick Hippo

(No comment was entered for this change.)

Comment #3

Posted on Aug 20, 2010 by Quick Hippo

Last night saw the following after resume w/ compat-wireless 2.6.36-rc1:

2010-08-19T17:59:50.703867-07:00 localhost kernel: [ 495.702140] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-19T17:59:50.703889-07:00 localhost kernel: [ 495.702267] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef

2010-08-19T18:00:59.687044-07:00 localhost kernel: [ 564.686076] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-19T18:00:59.687067-07:00 localhost kernel: [ 564.686204] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-19T18:02:54.728046-07:00 localhost kernel: [ 679.725328] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-19T18:02:54.728072-07:00 localhost kernel: [ 679.725454] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-19T18:06:18.707150-07:00 localhost kernel: [ 883.705369] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-19T18:06:18.707174-07:00 localhost kernel: [ 883.705495] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef

(and more instances). Wireless was somewhat functional but very slow as if it were reseting the chip frequently.

Comment #4

Posted on Aug 20, 2010 by Happy Horse

I've separated Sam's issue above off into Issue 5943. There's no strong reason to believe the two are related. I haven't seen the originally reported issue lately in the new compat-wireless, but will keep this issue open a bit longer.

Comment #5

Posted on Aug 20, 2010 by Happy Panda

(No comment was entered for this change.)

Comment #6

Posted on Aug 25, 2010 by Quick Hippo

I still have not seen this issue (reg 0x7000) since the jump to compat-wireless compat-wireless 2.6.36-rc2. It happened infrequently before, so I am still monitoring for it.

Comment #7

Posted on Aug 30, 2010 by Happy Horse

I've now run into an instance of this bug. There is now one additional line in the logs when it occurs:

"RTC stuck in MAC reset"

Comment #8

Posted on Aug 30, 2010 by Quick Hippo

(No comment was entered for this change.)

Comment #9

Posted on Sep 1, 2010 by Happy Horse

(No comment was entered for this change.)

Comment #10

Posted on Sep 2, 2010 by Grumpy Rabbit

Can you please test the patch in Issue 5943 and see if this helps with this.

Comment #11

Posted on Sep 3, 2010 by Happy Elephant

We've been seeing this on several systems: pstew took a look, said it was a known issue.

2010-09-03T12:57:36.303313-07:00 localhost kernel: [ 2426.292190] ------------[ cut here ]------------ 2010-09-03T12:57:36.303362-07:00 localhost kernel: [ 2426.292442] WARNING: at chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/rc.c:700 ath_rate_control_unregister+0x1a1/0x1084 ath9k 2010-09-03T12:57:36.303386-07:00 localhost kernel: [ 2426.292994] Hardware name: PineTrail

2010-09-03T12:57:36.303450-07:00 localhost kernel: [ 2426.295160] Pid: 1762, comm: phy0 Tainted: G WC 2.6.32.15+drm33.5 #1 2010-09-03T12:57:36.303467-07:00 localhost kernel: [ 2426.295497] Call Trace: 2010-09-03T12:57:36.303488-07:00 localhost kernel: [ 2426.295626] [<7902e8a7>] warn_slowpath_common+0x6a/0x81 2010-09-03T12:57:36.303508-07:00 localhost kernel: [ 2426.295893] [] ? ath_rate_control_unregister+0x1a1/0x1084 [ath9k] 2010-09-03T12:57:36.303528-07:00 localhost kernel: [ 2426.296276] [<7902e8d0>] warn_slowpath_null+0x12/0x15 2010-09-03T12:57:36.303548-07:00 localhost kernel: [ 2426.296548] [] ath_rate_control_unregister+0x1a1/0x1084 [ath9k] 2010-09-03T12:57:36.303569-07:00 localhost kernel: [ 2426.296877] [<790341eb>] ? local_bh_enable_ip+0xd/0xf 2010-09-03T12:57:36.303588-07:00 localhost kernel: [ 2426.297184] [] rate_control_get_rate+0x8b/0x137 [mac80211] 2010-09-03T12:57:36.303610-07:00 localhost kernel: [ 2426.297513] [] ieee80211_pspoll_get+0x894/0x124e [mac80211] 2010-09-03T12:57:36.303631-07:00 localhost kernel: [ 2426.297834] [] ? ieee80211_process_measurement_req+0x525/0x5ee [mac80211] 2010-09-03T12:57:36.303653-07:00 localhost kernel: [ 2426.298235] [] ieee80211_pspoll_get+0xfe7/0x124e [mac80211] 2010-09-03T12:57:36.303673-07:00 localhost kernel: [ 2426.298546] [<792c637a>] ? skb_release_data+0x92/0x96 2010-09-03T12:57:36.303691-07:00 localhost kernel: [ 2426.298794] [<792c6469>] ? pskb_expand_head+0xeb/0x160 2010-09-03T12:57:36.303710-07:00 localhost kernel: [ 2426.299070] [] ieee80211_pspoll_get+0x1246/0x124e [mac80211] 2010-09-03T12:57:36.303729-07:00 localhost kernel: [ 2426.299403] [<792c6cf5>] ? __alloc_skb+0x4e/0x10d 2010-09-03T12:57:36.303748-07:00 localhost kernel: [ 2426.299657] [] ieee80211_tx_skb+0x3f/0x46 [mac80211] 2010-09-03T12:57:36.303769-07:00 localhost kernel: [ 2426.299984] [] ieee80211_send_nullfunc+0x3c/0x40 [mac80211] 2010-09-03T12:57:36.303790-07:00 localhost kernel: [ 2426.300335] [] ieee80211_offchannel_stop_station+0xcb/0xe9 [mac80211] 2010-09-03T12:57:36.303811-07:00 localhost kernel: [ 2426.300695] [] ieee80211_scan_work+0x335/0x3ba [mac80211] 2010-09-03T12:57:36.303830-07:00 localhost kernel: [ 2426.301005] [<790402f8>] worker_thread+0x13b/0x1ae 2010-09-03T12:57:36.303850-07:00 localhost kernel: [ 2426.301277] [] ? ieee80211_scan_work+0x0/0x3ba [mac80211] 2010-09-03T12:57:36.303869-07:00 localhost kernel: [ 2426.301554] [<79043420>] ? autoremove_wake_function+0x0/0x34 2010-09-03T12:57:36.303887-07:00 localhost kernel: [ 2426.301800] [<790401bd>] ? worker_thread+0x0/0x1ae 2010-09-03T12:57:36.303905-07:00 localhost kernel: [ 2426.302016] [<79043217>] kthread+0x64/0x69 2010-09-03T12:57:36.303922-07:00 localhost kernel: [ 2426.302218] [<790431b3>] ? kthread+0x0/0x69 2010-09-03T12:57:36.303940-07:00 localhost kernel: [ 2426.302408] [<7900368f>] kernel_thread_helper+0x7/0x10 2010-09-03T12:57:36.303957-07:00 localhost kernel: [ 2426.302631] ---[ end trace 86408ae1233bb3ff ]---

Comment #12

Posted on Sep 7, 2010 by Happy Giraffe

Commit: c7f084083f35106a80ce3767b7020318f60e4bb1 Email: pstew@chromium.org

CHROMEOS: ath9k: fix power save race conditions

ath9k has a race on putting the chip into network sleep and having registers read from hardware. The race occurs because although ath9k_ps_restore() locks its own callers it makes use of some variables which get altered in the driver at different code paths. The variables are the ps_enabled and ps_flags.

This is easily reprodicible in large network environments when roaming with the wpa_supplicant simple bgscan. You'd get some 0xdeadbeef read out on certain registers such as:

ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef

ath: timeout (100000 us) on reg 0x7000: 0xdeadbeef & 0x00000003 != 0x00000000 ath: Chip reset failed

The fix is to protect the ath9k_config(hw, IEEE80211_CONF_CHANGE_PS) calls with a spin_lock_irqsave() which will disable contendors for these variables from interrupt context, timers, re-entry from mac80211 on the same callback, and most importantly from ath9k_ps_restore() which is the only call which will put the device into network sleep.

There are quite a few threads and bug reports on these a few of them are:

https://bugs.launchpad.net/ubuntu/karmic/+source/linux/+bug/407040 http://code.google.com/p/chromium-os/issues/detail?id=5709 http://code.google.com/p/chromium-os/issues/detail?id=5943

Cc: stable@kernel.org [2.6.32+] Signed-off-by: Luis R. Rodriguez

[To be replaced by cherry-pick]

BUG=chromium-os:5943, chromium-os:5709 TEST=Manual

Review URL: http://codereview.chromium.org/3367013

M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/main.c M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/recv.c

Comment #13

Posted on Sep 7, 2010 by Happy Giraffe

Commit: 16729c246ccdf3b970b0377be7076fb7c40b3456 Email: pstew@chromium.org

CHROMEOS: ath9k: fix power save race conditions

ath9k has a race on putting the chip into network sleep and having registers read from hardware. The race occurs because although ath9k_ps_restore() locks its own callers it makes use of some variables which get altered in the driver at different code paths. The variables are the ps_enabled and ps_flags.

This is easily reprodicible in large network environments when roaming with the wpa_supplicant simple bgscan. You'd get some 0xdeadbeef read out on certain registers such as:

ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef

ath: timeout (100000 us) on reg 0x7000: 0xdeadbeef & 0x00000003 != 0x00000000 ath: Chip reset failed

The fix is to protect the ath9k_config(hw, IEEE80211_CONF_CHANGE_PS) calls with a spin_lock_irqsave() which will disable contendors for these variables from interrupt context, timers, re-entry from mac80211 on the same callback, and most importantly from ath9k_ps_restore() which is the only call which will put the device into network sleep.

There are quite a few threads and bug reports on these a few of them are:

https://bugs.launchpad.net/ubuntu/karmic/+source/linux/+bug/407040 http://code.google.com/p/chromium-os/issues/detail?id=5709 http://code.google.com/p/chromium-os/issues/detail?id=5943

Cc: stable@kernel.org [2.6.32+] Signed-off-by: Luis R. Rodriguez

[To be replaced by cherry-pick]

BUG=chromium-os:5943, chromium-os:5709 TEST=Manual

Review URL: http://codereview.chromium.org/3367013 (cherry picked from commit c7f084083f35106a80ce3767b7020318f60e4bb1)

M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/main.c M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/recv.c

Comment #14

Posted on Sep 20, 2010 by Happy Panda

(No comment was entered for this change.)

Comment #15

Posted on Oct 4, 2010 by Helpful Lion

(No comment was entered for this change.)

Comment #16

Posted on Oct 8, 2010 by Grumpy Bear

After conversation with Paul, this issue is not yet fixed. It may be a BIOS power problem at this point.

Reopening this bug for possibly getting additional feedback.

Comment #17

Posted on Oct 19, 2010 by Happy Horse

This bug has been taken up as a partner issue. Closing this bug, but progress is ongoing elsewhere.

Comment #18

Posted on Nov 16, 2010 by Happy Monkey

I have the same problem with kernel 2.6.36 in Fedora 14... not Chrome OS specific...

Comment #19

Posted on Nov 16, 2010 by Grumpy Rabbit

2.6.36 won't have a lot of the patches we have posted recently fixing stable stuff. You may want to try the compat-wireless-2.6.36-5-spn.tar.bz2 release:

http://wireless.kernel.org/en/users/Download/stable/

Comment #20

Posted on Nov 18, 2010 by Grumpy Bear

This issue has popped up again on Brads Device. Will see if we can get more repeatable repro steps.

Comment #21

Posted on Mar 7, 2013 by Grumpy Hippo

(No comment was entered for this change.)

Comment #22

Posted on Mar 10, 2013 by Quick Rabbit

(No comment was entered for this change.)

Comment #23

Posted on Mar 12, 2013 by Happy Horse

Moved to: Issue chromium:187642

Status: Moved

Labels:
Type-Bug Pri-1 OS-Chrome M-9 Cr-OS-Systems-Network Cr-OS-Systems