Posted on Aug 14, 2010 by Quick Hippo

Bgscan operations can cause the ath9k driver to lose sync w/ the currently associated AP. This appears to be caused by incorrectly doing work like periodic calibration while off-channel. Upstream patches that may help with this are:

Felix Fietkau (7): ath9k_hw: clean up and fix initial noise floor calibration ath9k_hw: fix periodic noise floor calibration on AR9003 ath9k: fix a crash in the PA predistortion apply function ath9k_hw: fix analog shift register writes on AR9003 ath9k: prevent calibration during off-channel activity ath9k_hw: clean up per-channel calibration data ath9k_hw: fix a noise floor calibration related race condition

To reproduce run streaming video and do a walk-around test to force roaming. Watch for problems in the console msgs.

Comment #1

Posted on Aug 14, 2010 by Quick Hippo

(No comment was entered for this change.)

Comment #2

Posted on Aug 20, 2010 by Happy Panda

(No comment was entered for this change.)

Comment #3

Posted on Aug 25, 2010 by Quick Hippo

Still seeing similar complaints w/ compat wireless 2.6.36-rc2 while roaming on Google-A w/ video streaming. But these are happening less frequently and may be a different issue as they look to be decoupled from bgscan (more likely this is due to poor roaming and sticking to an ap too long).

Comment #4

Posted on Aug 30, 2010 by Quick Hippo

Seeing this frequently on a new dogfood machine together with register read timeouts:

2010-08-30T16:12:36.179882-07:00 localhost kernel: [ 205.704077] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:12:42.176006-07:00 localhost kernel: [ 211.701119] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:12:47.179950-07:00 localhost kernel: [ 216.704086] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:12:50.562892-07:00 localhost kernel: [ 220.087255] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-30T16:12:50.562931-07:00 localhost kernel: [ 220.087400] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-30T16:12:53.327002-07:00 localhost kernel: [ 222.851182] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-30T16:12:53.327047-07:00 localhost kernel: [ 222.851301] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-30T16:13:05.176077-07:00 localhost kernel: [ 234.701087] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:13:06.557030-07:00 localhost kernel: [ 236.081743] phy0: release an RX reorder frame due to timeout on earlier frames 2010-08-30T16:13:06.557161-07:00 localhost kernel: [ 236.081761] phy0: release an RX reorder frame due to timeout on earlier frames 2010-08-30T16:13:06.557195-07:00 localhost kernel: [ 236.081773] phy0: release an RX reorder frame due to timeout on earlier frames 2010-08-30T16:13:08.093888-07:00 localhost kernel: [ 237.618678] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-30T16:13:08.093930-07:00 localhost kernel: [ 237.618810] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-30T16:13:10.310990-07:00 localhost kernel: [ 239.835886] ath: timeout (100000 us) on reg 0x806c: 0xdeadbeef & 0x01f00000 != 0x00000000 2010-08-30T16:13:10.311031-07:00 localhost kernel: [ 239.836003] ath: RX failed to go idle in 10 ms RXSM=0xdeadbeef 2010-08-30T16:13:14.176938-07:00 localhost kernel: [ 243.701196] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:13:17.179942-07:00 localhost kernel: [ 246.704117] wlan0: detected beacon loss from AP - sending probe request 2010-08-30T16:13:24.176914-07:00 localhost kernel: [ 253.701173] wlan0: detected beacon loss from AP - sending probe request

Comment #5

Posted on Aug 31, 2010 by Quick Hippo

On a new dogfood machine I am seeing these msgs every few seconds when power save is enabled (iw dev wlan0 set power_save on). This is on my home network: 11n+WPA-PSK/AES, 2437, snr -67, Airport Extreme.

With power save disabled fewer beacons are missed. Right after disable I saw a few complaints about send ProbeReq frames to verify the AP was still there and some frames flushed from the AMPDU RX reorder buffer due to timeouts. After a bit all is quiet. Feels like sta+AP don't have their TSF sync'd.

Comment #6

Posted on Sep 3, 2010 by Happy Panda

(No comment was entered for this change.)

Comment #7

Posted on Sep 3, 2010 by Grumpy Rabbit

Attached is some beacon loss captured with full debug enabled on a machine that has ATH_DEBUG enabled. ATH_DEBUG is required to enable debug prints in ath9k. The /sys/kernel/debug/ath9k/phy0/debug file was echo'd in the value 0x00ffffff. The should be enough to get all debug messages:

enum ATH_DEBUG { ATH_DBG_RESET = 0x00000001, ATH_DBG_QUEUE = 0x00000002, ATH_DBG_EEPROM = 0x00000004, ATH_DBG_CALIBRATE = 0x00000008, ATH_DBG_INTERRUPT = 0x00000010, ATH_DBG_REGULATORY = 0x00000020, ATH_DBG_ANI = 0x00000040, ATH_DBG_XMIT = 0x00000080, ATH_DBG_BEACON = 0x00000100, ATH_DBG_CONFIG = 0x00000200, ATH_DBG_FATAL = 0x00000400, ATH_DBG_PS = 0x00000800, ATH_DBG_HWTIMER = 0x00001000, ATH_DBG_BTCOEX = 0x00002000, ATH_DBG_WMI = 0x00004000, ATH_DBG_BSTUCK = 0x00008000, ATH_DBG_ANY = 0xffffffff };

By default this is set to ATH_DBG_FATAL only.

Attachments

beacon-loss.log.bz2 331.07KB

Comment #8

Posted on Sep 3, 2010 by Grumpy Rabbit

OK here is another debug log dump, now with roaming. What this proves is at least the beacon synch is taking effect while roaming. Beacon loss happens even when not roaming. Going to do one more roam walk through now with power save disabled just to be 100% sure this is power save related.

Attachments

roam-beacon-loss.log.bz2 9.2MB

Comment #9

Posted on Sep 3, 2010 by Grumpy Rabbit

With power save disabled I still get beacon loss but it happened only once. Attached is the log. So it seems power save just helps reproduce it. Next I will try to reproduce with roaming disabled.

Attachments

beacon-loss-ps-off.log.bz2 331.07KB

Comment #10

Posted on Sep 3, 2010 by Grumpy Rabbit

OK -- if I disable bgscan everything is peachy:

ctrl_interface=/var/run/wpa_supplicant network={ ssid="GoogleGuest" scan_ssid=1 key_mgmt=NONE }

If you force a bgscan with with 'iw dev wlan0 scan' right after that you will see we start getting beacon loss. So the issue is 100% related to bgscan effects. Will dig in to that now.

Comment #11

Posted on Sep 3, 2010 by Grumpy Rabbit

OK -- I've managed now to reproduce this with a simple association to an AP with the above wpa_supplicant config, without even getting an IP address. The trigger for the issue is to issue a scan with:

iw dev wlan0 scan

This triggers a background scan. The wireshark capture reflects the probe request / probe responses, and then towards the end we see when we get the beacon loss. Will look at this some more.. The area that looks a little fishy to me right now is:

    if (local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK) {
            if (directed_tim) {
                    if (local->hw.conf.dynamic_ps_timeout > 0) {
                            local->hw.conf.flags &= ~IEEE80211_CONF_PS;
                            ieee80211_hw_config(local,
                                                IEEE80211_CONF_CHANGE_PS);
                            ieee80211_send_nullfunc(local, sdata, 0);
                    } else {
                            local->pspolling = true;

                            /*
                             * Here is assumed that the driver will be
                             * able to send ps-poll frame and receive a
                             * response even though power save mode is
                             * enabled, but some drivers might require
                             * to disable power save here. This needs
                             * to be investigated.
                             */
                            ieee80211_send_pspoll(local, sdata);
                    }
            }
    }

Within mac80211 net/mac80211/mlme.c, I am not sure if we this handles offchannel operation well.

Comment #12

Posted on Sep 3, 2010 by Grumpy Rabbit

Deleted file :)

Comment #13

Posted on Sep 3, 2010 by Grumpy Rabbit

Please try this patch

Attachments

require-beacon-synch-after-offchannel-op.patch 766

Comment #14

Posted on Sep 4, 2010 by Grumpy Rabbit

Ignore that patch, please try this one instead. I believe this is ready for upstream submission as well. Please test and let us know.

Attachments

0002-ath9k-fix-beacon-loss-after-bgscan.patch 1.66KB

Comment #15

Posted on Sep 6, 2010 by Quick Hippo

Issue 6469 has been merged into this issue.

Comment #16

Posted on Sep 7, 2010 by Happy Horse

Alas, even with this patch, I end up with the "500ms" beacon timeout with my Android device in tethering mode, immediately (same second in the logs) after completing bgscan.

Comment #17

Posted on Sep 7, 2010 by Grumpy Rabbit

I believe what you are seeing is a separate issue, but nonetheless an issue. With my nexus one as an AP I do not get the beacon loss, but I do get a probe response timeout:

Sep 7 00:42:58 tux kernel: [ 122.390083] No probe response from AP 02:23:76:e7:de:70 after 500ms, try 1 Sep 7 00:42:58 tux kernel: [ 122.890090] No probe response from AP 02:23:76:e7:de:70 after 500ms, try 2

Sep 7 00:42:59 tux kernel: [ 123.390119] No probe response from AP 02:23:76:e7:de:70 after 500ms, try 3 Sep 7 00:42:59 tux kernel: [ 123.890077] No probe response from AP 02:23:76:e7:de:70 after 500ms, try 4 Sep 7 00:43:00 tux kernel: [ 124.390137] No probe response from AP 02:23:76:e7:de:70 after 500ms, disconnecting. Sep 7 00:43:00 tux kernel: [ 124.420128] phy0: Removed STA 02:23:76:e7:de:70 Sep 7 00:43:00 tux kernel: [ 124.420215] phy0: Destroyed STA 02:23:76:e7:de:70 Sep 7 00:43:00 tux kernel: [ 124.420223] phy0: device now idle Sep 7 00:43:00 tux kernel: [ 124.426800] cfg80211: All devices are disconnected, going to restore regulatory settings Sep 7 00:43:00 tux kernel: [ 124.426809] cfg80211: Restoring regulatory settings Sep 7 00:43:00 tux kernel: [ 124.426816] cfg80211: Calling CRDA to update world regulatory domain Sep 7 00:43:00 tux kernel: [ 124.430655] cfg80211: World regulatory domain updated: Sep 7 00:43:00 tux kernel: [ 124.430657] (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) Sep 7 00:43:00 tux kernel: [ 124.430660] (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Sep 7 00:43:00 tux kernel: [ 124.430662] (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) Sep 7 00:43:00 tux kernel: [ 124.430665] (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) Sep 7 00:43:00 tux kernel: [ 124.430667] (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Sep 7 00:43:00 tux kernel: [ 124.430669] (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Sep 7 00:43:00 tux kernel: [ 124.526795] phy0: device no longer idle - scanning

I get this when simply associating to it with this wpa_supplicant conf:

mcgrof@tux ~/wpa $ cat mcgrof-nexus.conf ctrl_interface=/var/run/wpa_supplicant

ap_scan=1

network={ ssid="mcgrof-nexus-one" #bgscan="simple:30:-45:300" scan_ssid=1 key_mgmt=NONE }

And wpa_supplicant -i wlan0 -D nl80211 -c mcgrof-nexus.conf. I then just do one scan, and I get the above.

Perhaps that is what you see as well?

Comment #18

Posted on Sep 7, 2010 by Happy Horse

Interestingly, my logs show no retries at all. That might be an issue. I'll do a little digging to see why you have retries and I don't. I've attached a merged wpa_supplicant and /var/log/messages for a typical disconnect. Here's the salient bit though:

2010-09-07T06:55:15.319206-07:00 localhost wpa_supplicant[956]: CTRL-EVENT-CONNECTED - Connection to 3a:e7:d8:5d:1e:34 completed (reauth) [id=0 id_str=] 2010-09-07T06:55:15.319244-07:00 localhost wpa_supplicant[956]: wpa_driver_nl80211_set_operstate: operstate 0->1 (UP) 2010-09-07T06:55:15.319280-07:00 localhost wpa_supplicant[956]: netlink: Operstate: linkmode=-1, operstate=6 2010-09-07T06:55:15.319812-07:00 localhost wpa_supplicant[956]: bgscan simple: Signal strength threshold -45 Short bgscan interval 30 Long bgscan interval 300 [...] 2010-09-07T06:55:15.708537-07:00 localhost wpa_supplicant[956]: nl80211: Event message available 2010-09-07T06:55:15.708634-07:00 localhost wpa_supplicant[956]: nl80211: Connection quality monitor event: RSSI low 2010-09-07T06:55:18.303654-07:00 localhost wpa_supplicant[956]: EAPOL: startWhen --> 0 2010-09-07T06:55:18.303708-07:00 localhost wpa_supplicant[956]: EAPOL: disable timer tick 2010-09-07T06:55:36.397749-07:00 localhost wpa_supplicant[956]: nl80211: Event message available 2010-09-07T06:55:36.397833-07:00 localhost wpa_supplicant[956]: nl80211: Connection quality monitor event: RSSI high 2010-09-07T06:55:45.324266-07:00 localhost wpa_supplicant[956]: bgscan simple: Request a background scan 2010-09-07T06:55:45.324921-07:00 localhost wpa_supplicant[956]: Scan requested (ret=0) - scan timeout 30 seconds 2010-09-07T06:55:45.324997-07:00 localhost wpa_supplicant[956]: nl80211: Event message available 2010-09-07T06:55:45.325096-07:00 localhost wpa_supplicant[956]: nl80211: Scan trigger 2010-09-07T06:55:45.329742-07:00 localhost wpa_supplicant[956]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1 2010-09-07T06:55:51.290084-07:00 localhost wpa_supplicant[956]: nl80211: Event message available 2010-09-07T06:55:51.290179-07:00 localhost wpa_supplicant[956]: nl80211: New scan results available 2010-09-07T06:55:51.290236-07:00 localhost wpa_supplicant[956]: Received scan results (3 BSSes) 2010-09-07T06:55:51.290293-07:00 localhost wpa_supplicant[956]: nl80211: Scan results indicate BSS status with 3a:e7:d8:5d:1e:34 as associated [...] 2010-09-07T06:55:51.299621-07:00 localhost wpa_supplicant[956]: Skip roam - too small difference in signal level 2010-09-07T06:55:51.299881-07:00 localhost wpa_supplicant[956]: RTM_NEWLINK: operstate=1 ifi_flags=0x11043 ([UP][RUNNING][LOWER_UP]) 2010-09-07T06:55:51.300275-07:00 localhost wpa_supplicant[956]: RTM_NEWLINK, IFLA_IFNAME: Interface 'wlan0' added 2010-09-07T06:55:51.300460-07:00 localhost wpa_supplicant[956]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1 2010-09-07T06:55:51.301855-07:00 localhost wpa_supplicant[956]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/1/BSSs/0 2010-09-07T06:55:53.797203-07:00 localhost kernel: [ 356.796181] No probe response from AP 3a:e7:d8:5d:1e:34 after 500ms, disconnecting.

Attachments

disconnect_log.txt 26.31KB

Comment #19

Posted on Sep 7, 2010 by Grumpy Rabbit

Heh yeah that is odd.. OK so you do see probe response issues but you get immediately disconnected. Yeah this is odd. I'm going to look at my own probe issue now.

BTW it may also help to capture 'iw event -t' logs

Comment #20

Posted on Sep 7, 2010 by Happy Horse

I figured out why mcgrof was getting "try #N" messages and I wasn't. Those messages are under a "#ifdef CONFIG_MAC80211_VERBOSE_DEBUG" that presumably we don't have set in ChromiumOS. This means that my unsuccessful probe run started 4*500ms = 2sec earlier, that places me 500ms after the scan results came in.

Comment #21

Posted on Sep 7, 2010 by Grumpy Rabbit

In order for you to get the beacon loss messages you also need CONFIG_MAC80211_VERBOSE_DEBUG so I figured you had that enabled.

Comment #22

Posted on Sep 7, 2010 by Grumpy Rabbit

Using a sniffer I see the probe requests and probe responses from the nexus one. The probe retries happen very quickly with nothing coming in between them, as if we hurry during the retries.

Doing more digging..

Comment #23

Posted on Sep 7, 2010 by Grumpy Rabbit

OK see this log of the probes going out quickly, the replies only come in after that. Do you see the same? I'll look into the timing of this next, it could be that we are not waiting long enough for some reason.

Attachments

monitor-nexus-one.png 205.81KB

Comment #24

Posted on Sep 7, 2010 by Grumpy Rabbit

Ok the timing is by design see:

commit d1c5091f23fed5195271e2849f89017d3a126521 Author: Maxim Levitsky Date: Fri Jul 31 18:54:23 2009 +0300

mac80211: Increase timeouts for station polling

Do a probe request every 30 seconds, and wait for probe response,
half a second This should lower the traffic that card sends, thus save
power Wainting longer for response makes probe more robust against
'slow' access points

Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Tested-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index ccd5c7a..6d5a1ee 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -42,13 +42,13 @@ * Time the connection can be idle before we probe * it to see if we can still talk to the AP. */ -#define IEEE80211_CONNECTION_IDLE_TIME (2 * HZ) +#define IEEE80211_CONNECTION_IDLE_TIME (30 * HZ) /* * Time we wait for a probe response after sending * a probe request because of beacon loss or for * checking the connection still works. */ -#define IEEE80211_PROBE_WAIT (HZ / 5) +#define IEEE80211_PROBE_WAIT (HZ / 2)

#define TMR_RUNNING_TIMER 0 #define TMR_RUNNING_CHANSW 1

And the above seems to be well reflected in the wireshark screenshot. What doesn't make much sense though is why we are sending a probe request given that we are supposed to be receiving beacons. I'm going to try this patch now which kills the idle connection monitor while we are scanning, we shouldn't be doing that while off channel.

Attachments

off-chan-fix-conn-idle.patch 917

Comment #25

Posted on Sep 7, 2010 by Grumpy Rabbit

Even if we do what I suggest in the above patch and also even increase IEEE80211_PROBE_WAIT to 5 seconds (5 * HZ) I still see the STA not receiving the probe responses from the AP so something else is messed up. Digging.

Comment #26

Posted on Sep 7, 2010 by Happy Horse

I did a trace and have visually confirmed that (1) there are beacons every 100ms during the entire period where the disconnect happens (including while this error condition occurs -- why does reception of a beacon not reset this condition?) (2) I can confirm that I see probe requests every 500ms. (3) I also see the Android phone responding by ACKing the Probe Request, but I don't ever see probe responses. It is possible at least in my case that the phone doesn't consider the probe request valid (although it is happy to respond to broadcast probe requests for other hosts). Perhaps it's the HT block?

Attachments

cap.png 305.31KB

Comment #27

Posted on Sep 7, 2010 by Happy Horse

Here's a snapshot of a capture of the same time period as above, except I've opened up the filters a bit to show all probe requests and responses. You can see the access point responding to broadcast probe requests at the same time as it ACKs (but does not respond in any other way) to directed probe requests. I've also attached a pic of the zoom-in of the probe request that the client is sending.

Attachments

cap2.png 276.7KB

cap3.png 63.33KB

Comment #28

Posted on Sep 8, 2010 by Happy Horse

I've verified that from the STA's perspective (not just from a third party) the beacons from the attached BSS are visible. I consider it a possible bug that the Android device doesn't respond to the unicast probe request. However, it's a bigger bug in my mind for our STA to depart the access point although beacons are in full view. I'll be diving into mac80211 to see if I can fix that behavior. But first, I'll apply your patch above.

Comment #29

Posted on Sep 8, 2010 by Happy Horse

I tried your patch (actually, the attached variant since yours didn't compile since IEEE80211_CONNECTION_IDLE_TIME is local to mlme.c). The problem doesn't occur as often as a result. However, it does still occur, and for good reason -- if the connection is idle (i.e, no unicast traffic is received) for the 30 second idle interval (less likely since we also have periodic background scan which can reset the 30 second timer, but still completely possible), we still start our probes, and since the AP doesn't support unicast probe requests, we die.

I think there's a couple underlying issues here. First is that any period of "idle" -- i.e. no unicast traffic directed to us -- is considered a reason to probe the connection. Beacons are ignored, both as a factor for resetting this idle timer and also ignored as a potential indicator that the probe was successful. Depending on what we're trying to test, that's probably correct. Incoming broadcast activity doesn't provide a basis for believing we still have end-to-end connectivity with the AP.

However, since unicast probe requests don't appear universally supported, and there are plenty legitimate reasons why we can end up without any unicast incoming traffic, probing using only a non-standard probe request seems wrong. Therefore, I used the second attached patch below, which tries broadcast for the last two probe-request attempts. This works to keep us on the AP.

However, I still wonder about this timer, because I can have a ping running at the same time and still every 30 seconds the idle timer gets triggered. I'll look into exactly what kind of unicast traffic we need to see in order to forestall the timer.

Attachments

mac80211-conn-mon-timer-pstew.diff 2.96KB

mac80211-probe-unicast-broadcast.diff 1.07KB

Comment #30

Posted on Sep 8, 2010 by Grumpy Rabbit

You're on point Paul, the probes should not be sent if we are TX'ing data, however, why the hell are they going out if prior to a bg scan we do not send them? We should review this upstream.

Comment #31

Posted on Sep 8, 2010 by Grumpy Rabbit

Paul, I like your approach for the broadcast Probe Request. If these patches help with connectivity then these in my eyes are stable fixes as well. I just was just trying to find on the IEEE-802.11-2007 if it was required by APs to respond to to those and I was unable to find language that indicated it was so, so your approach seems like a reasonable compromise.

I have some changes to the first patch too, and a small change for your second one. Will post here soon new revisions for you review.

Are you seeing any beacon loss anymore? I'd like to know so I can send out what we have queued up for stable so far. I can also wait until we resolve all these other side issues we are noticing.

Comment #32

Posted on Sep 8, 2010 by Quick Hippo

Unicast ProbeRequest frames are perfectly valid and cannot be dropped by an AP AFAIK. I've used them for years to check if an AP is still alive.

As to idle not using beacon rx or other reverse traffic (ACK of tx data) to track idle is a bug. In my experience one should only send a ProbeReq on bmiss.

Comment #33

Posted on Sep 8, 2010 by Grumpy Rabbit

Thanks for the feedback Sam, lets go ahead and treat this as such then. I'll bake some new patches soon.

Comment #34

Posted on Sep 9, 2010 by Grumpy Rabbit

OK here is my latest full series:

http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2010/09/PS-fixes-09-08/

Or all-in-one tarball:

http://www.kernel.org/pub/linux/kernel/people/mcgrof/patches/ath9k/2010/09/PS-fixes-09-08.tar.bz2

Please test and let me know.

Sam -- as for your note on beacons, I saw a note from Johannes on why beacons are not treated equally. The connection monitor is intended for data connection not a beacon monitor, as such it seem he designed this to ensure we are connected. Amod suggests we use simply null data frames for this. We do have framework in place for checking for the ACK in mac80211 so we could use that as well.

Anyway if you still want to do what you suggest you could do something like this:

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 6e60bef..82d7174 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -1009,12 +1009,12 @@ void ieee80211_sta_rx_notify(struct ieee80211_sub_if_data *sdata, struct ieee80211_hdr hdr) { / - * We can postpone the mgd.timer whenever receiving unicast frames - * from AP because we know that the connection is working both ways - * at that time. But multicast frames (and hence also beacons) must - * be ignored here, because we need to trigger the timer during - * data idle periods for sending the periodic probe request to the - * AP we're connected to. + * We can postpone the connection monitor whenever receiving + * unicast frames from AP because we know that the connection + * is working both ways at that time -- But multicast frames + * must be ignored here, because we need to trigger the timer + * during data idle periods for sending the periodic probe + * request to the AP we're connected to. */ if (is_multicast_ether_addr(hdr->addr1)) return; @@ -1607,6 +1607,7 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_sub_if_data *sdata, * we are processing a beacon from the AP just now. */ ieee80211_sta_reset_beacon_monitor(sdata); + ieee80211_sta_reset_conn_monitor(sdata);

    ncrc = crc32_be(0, (void *)&mgmt->u.beacon.beacon_int, 4);
    ncrc = ieee802_11_parse_elems_crc(mgmt->u.beacon.variable,

After applying all my patches, but I highly recommend for this to be simply discussed at the wireless summit this Thursday-Friday.

Comment #35

Posted on Sep 9, 2010 by Grumpy Rabbit

Oh and Paul -- I think the Nexus One just delays sending the unicast probe responses by about 5 seconds. For some reason broadcast probe requests seem to go out of the device quicker.

Comment #36

Posted on Sep 9, 2010 by Grumpy Rabbit

Paul -- BTW please just test these for now, I am working on enhancing the commit log text so I will provide some final ones for upstream submission.

Comment #37

Posted on Sep 9, 2010 by Happy Horse

Your suite of patches work as advertised with my Nexus One.

Comment #38

Posted on Sep 20, 2010 by Happy Panda

(No comment was entered for this change.)

Comment #39

Posted on Sep 21, 2010 by Quick Hippo

This is mostly fixed w/ upstream changes. Plan is to split mac80211 hwflag that currently controls both beacon miss + connection monitoring facilities and then disable connection monitoring and use cqm signal from the rate control module instead.

Comment #40

Posted on Sep 22, 2010 by Happy Giraffe

Commit: 651e9a14450c4e2b9d82e96973f1c90a7e3d7dad Email: pstew@chromium.org

CHROMIUMOS: mac80211: send last 3/5 probe requests as unicast

Some buggy APs do not respond to unicast probe requests or send unicast probe requests very delayed so in the worst case we should try to send broadcast probe requests, otherwise we can get disconnected from these APs.

Even if drivers do not have filters to disregard probe responses from foreign APs mac80211 will only process probe responses from our associated AP for re-arming connection monitoring.

We need to do this since the beacon monitor does not push back the connection monitor by design so even if we are getting beacons from these type of APs our connection monitor currently relies heavily on the way the probe requests are received on the AP. An example of an AP affected by this is the Nexus One, but this has also been observed with random APs.

We can probably optimize this later by using null funcs instead of probe requests.

For more details refer to:

http://code.google.com/p/chromium-os/issues/detail?id=5715

This patch has fixes for stable kernels [2.6.35+].

Cc: stable@kernel.org Cc: Paul Stewart Cc: Amod Bodas Signed-off-by: Luis R. Rodriguez

BUG=chromium-os:5715 TEST=Walkaround tests + SecMat and MatFunc testbed runs

Review URL: http://codereview.chromium.org/3434005

M chromeos/compat-wireless/net/mac80211/mlme.c

Comment #41

Posted on Sep 22, 2010 by Happy Giraffe

Commit: 8fe0f449fa992147d7aac7168529aa4ce069ebc0 Email: pstew@chromium.org

CHROMIUMOS: mac80211: reset connection idle when going offchannel

When we go offchannel mac80211 currently leaves alive the connection idle monitor. This should be instead postponed until we come back to our home channel, otherwise by the time we get back to the home channel we could be triggering unecesary probe requests. For APs that do not respond to unicast probe requests (Nexus One is a simple example) this means we essentially get disconnected after the probes fails.

This patch has stable fixes for kernels [2.6.35+]

Cc: stable@kernel.org Cc: Paul Stewart Cc: Amod Bodas Signed-off-by: Luis R. Rodriguez

BUG=chromium-os:5715 TEST=Walkaround tests + SecMat and MatFunc testbed runs

Review URL: http://codereview.chromium.org/3436007

M chromeos/compat-wireless/net/mac80211/offchannel.c

Comment #42

Posted on Sep 22, 2010 by Happy Giraffe

Commit: 2162734977313d09465567b1d9453ac6b4af51ae Email: pstew@chromium.org

CHROMIUMOS: mac80211: disable beacon monitor while going offchannel

The beacon monitor should be disabled when going off channel to prevent spurious warnings and triggering connection deterioration work such as sending probe requests. Re-enable the beacon monitor once we come back to the home channel.

This patch has fixes for stable kernels [2.6.34+].

Cc: stable@kernel.org Cc: Paul Stewart Cc: Amod Bodas Signed-off-by: Luis R. Rodriguez

[Based off 3405004 -- will rebase]

BUG=chromium-os:5715 TEST=Walkaround tests + SecMat and MatFunc testbed runs

Review URL: http://codereview.chromium.org/3438006

M chromeos/compat-wireless/net/mac80211/offchannel.c

Comment #43

Posted on Sep 22, 2010 by Happy Giraffe

Commit: 30e945c06dbfc2043bd85b9d8885f982494ac805 Email: pstew@chromium.org

CHROMIUMOS: ath9k: fix enabling ANI / tx monitor after bg scan

ath9k's entire logic with SC_OP_SCANNING is incorrect due to the way mac80211 currently implements the scan complete callback and we handle it in ath9k. This patch removes the flag completely in preference for the SC_OP_OFFCHANNEL which is really what we wanted.

The scanning flag was used to ensure we reset ANI to the old values when we go back to the home channel, but if we are offchannel we use some defaults. The flag was also used to re-enable the TX monitor.

Without this patch we simply never re-enabled ANI and the TX monitor after going offchannel. This means that after one background scan we are prone to noise issues and if we had a TX hang we would not recover. To get this to work properly we must enable ANI after we have configured the beacon timers, otherwise hardware acts really oddly.

This patch has stable fixes which apply down to [2.6.36+], there may be a to fix this on older kernels but requires a bit of work since this patch relies on the new mac80211 flag IEEE80211_CONF_OFFCHANNEL which was introduced as of 2.6.36.

Cc: stable@kernel.org Cc: Paul Stewart Cc: Amod Bodas Signed-off-by: Luis R. Rodriguez

BUG=chromium-os:5715 TEST=Walkaround tests + SecMat and MatFunc testbed runs

Review URL: http://codereview.chromium.org/3427005

M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/ath9k.h M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/main.c M chromeos/compat-wireless/drivers/net/wireless/ath/ath9k/recv.c

chromium-os - issue #5715

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

Comment #17

ap_scan=1

Comment #18

Comment #19

Comment #20

Comment #21

Comment #22

Comment #23

Comment #24

Comment #25

Comment #26

Comment #27

Comment #28

Comment #29

Comment #30

Comment #31

Comment #32

Comment #33

Comment #34

Comment #35

Comment #36

Comment #37

Comment #38

Comment #39

Comment #40

Comment #41

Comment #42

Comment #43

Comment #44

Comment #45

Comment #46

Comment #47

Comment #48

Comment #49

Comment #50

Comment #51