Obsolete
Status Update
Comments
em...@gmail.com <em...@gmail.com> #2
I've got a BLE peripheral that is capable of being connected as slave to multiple centrals and just tried what would happen if you use autoConnect parameter set to true and also have a pending connection to another device. What I expected happens:
On a Nexus 5 (that has a Broadcom Bluetooth 4.1 chip), running the latest Marshmallow update, as soon as the BLE peripheral starts an advertisement after it has connected, the previous link is dropped by the Broadcom chip. Since this goes on and on, it gets stuck in a "connect-disconnect loop". I've attached a new snoop log for this. See from packet 1865 and forward.
On an LG G3 phone (that has a Qualcomm Bluetooth 4.1 chip), running the latest Marshmallow update, as soon as the BLE peripheral starts an advertisement after it has connected, the bluetooth chip now initiates the second connection to this device, notifying the host about this, which as written above gets confused and gets stuck in a state where it will make no further connections until Bluetooth is restarted. I've attached a new snoop log for this as well. See from packet 593 and forward.
On a Nexus 5X phone (that has a Qualcomm Bluetooth 4.2 chip), no bug will be triggered and there are no problems. The reason is that since the Bluetooth 4.2 standard, some new sentences have been added: "There shall be only one connection between two LE device addresses. An initiator shall not send a connection request to an advertiser it is already connected to. If an advertiser receives a connection request from an initiator it is already connected to, it shall ignore that request." The bluetooth chip honours this and will therefore not establish a new connection, even though the peripheral advertises and its address is in the white list.
So, I really think the best fix would be to make sure that when there is a pending connection, the white list should not contain addresses to devices that already have an established connection. As newer BLE peripherals that support multiple centrals will eventually be more common on the market, I think it's good to fix this issue as soon as possible before people are stuck with non-upgradable old Marshmallow phones.
On a Nexus 5 (that has a Broadcom Bluetooth 4.1 chip), running the latest Marshmallow update, as soon as the BLE peripheral starts an advertisement after it has connected, the previous link is dropped by the Broadcom chip. Since this goes on and on, it gets stuck in a "connect-disconnect loop". I've attached a new snoop log for this. See from packet 1865 and forward.
On an LG G3 phone (that has a Qualcomm Bluetooth 4.1 chip), running the latest Marshmallow update, as soon as the BLE peripheral starts an advertisement after it has connected, the bluetooth chip now initiates the second connection to this device, notifying the host about this, which as written above gets confused and gets stuck in a state where it will make no further connections until Bluetooth is restarted. I've attached a new snoop log for this as well. See from packet 593 and forward.
On a Nexus 5X phone (that has a Qualcomm Bluetooth 4.2 chip), no bug will be triggered and there are no problems. The reason is that since the Bluetooth 4.2 standard, some new sentences have been added: "There shall be only one connection between two LE device addresses. An initiator shall not send a connection request to an advertiser it is already connected to. If an advertiser receives a connection request from an initiator it is already connected to, it shall ignore that request." The bluetooth chip honours this and will therefore not establish a new connection, even though the peripheral advertises and its address is in the white list.
So, I really think the best fix would be to make sure that when there is a pending connection, the white list should not contain addresses to devices that already have an established connection. As newer BLE peripherals that support multiple centrals will eventually be more common on the market, I think it's good to fix this issue as soon as possible before people are stuck with non-upgradable old Marshmallow phones.
em...@gmail.com <em...@gmail.com> #3
FYI Same test as in last comment with a Sony Xperia Z3 (Broadcom Bluetooth 4.1) with the latest Android N Developer Preview (NPD56N.1000106): the issue still persists. It behaves exactly the same as with the Nexus 5 running Marshmallow. I'm attaching a snoop log. See packet 1161 and forward.
dn...@google.com <dn...@google.com>
dn...@google.com <dn...@google.com> #4
Can you please check the issue on latest version of android N and let us know the result?
If issue reproduces, then please share the bugreport and bt snoop logs for the same.
Android bug report:
After reproducing the issue, navigate to developer settings, ensure ‘USB debugging’ is enabled, then enable ‘Bug report shortcut’. To take bug report, hold the power button and select the ‘Take bug report’ option.
Note: Please upload the files to google drive and share the folder to android-bugreport@google.com, then share the link here.
If issue reproduces, then please share the bugreport and bt snoop logs for the same.
Android bug report:
After reproducing the issue, navigate to developer settings, ensure ‘USB debugging’ is enabled, then enable ‘Bug report shortcut’. To take bug report, hold the power button and select the ‘Take bug report’ option.
Note: Please upload the files to google drive and share the folder to android-bugreport@google.com, then share the link here.
em...@gmail.com <em...@gmail.com> #5
Hi. Nothing has changed in the new Android version (NBD90W), so the bluetooth hci captures and logcat logs above are still valid.
Android's Bluetooth team has already read this report and are currently working on a fix:https://android-review.googlesource.com/#/c/268213/
Android's Bluetooth team has already read this report and are currently working on a fix:
dn...@google.com <dn...@google.com> #6
We have passed this defect on to the development team and will update this issue with more information as it becomes available.
em...@gmail.com <em...@gmail.com> #7
I thought Qualcomm's Bluetooth 4.2 controllers did not suffer from this problem but apparently I was wrong.
Here attached is a hci snoop log from a Moto Z Play 2016 having a Qualcomm Bluetooth 4.2 controller. At packet 617 the device 80:e4:da:70:ec:66 gets connected. At packet 686 it gets connected again even though no Disconnect Complete event has been sent in between. 31 seconds later the first connection times out. I don't understand how that is possible however that the first connections stays alive for so long time since the applied supervision timeout is 8 seconds. The peripheral used cannot be in advertising state and connected state at the same time.
This means using "auto connect" with Android >= 6.0 for more than one device at the same time is very likely to fail if you just wait long enough.
I have also filed a new CL here:https://android-review.googlesource.com/#/c/315988/ .
Here attached is a hci snoop log from a Moto Z Play 2016 having a Qualcomm Bluetooth 4.2 controller. At packet 617 the device 80:e4:da:70:ec:66 gets connected. At packet 686 it gets connected again even though no Disconnect Complete event has been sent in between. 31 seconds later the first connection times out. I don't understand how that is possible however that the first connections stays alive for so long time since the applied supervision timeout is 8 seconds. The peripheral used cannot be in advertising state and connected state at the same time.
This means using "auto connect" with Android >= 6.0 for more than one device at the same time is very likely to fail if you just wait long enough.
I have also filed a new CL here:
[Deleted User] <[Deleted User]> #8
I am curious - what effect would setting `autoConnect` parameter to false have on this issue? What is the necessity for `autoConnect` being set to true.
em...@gmail.com <em...@gmail.com> #9
If the issue has already occurred, you won't be able to connect even if you set autoConnect to false. But the issue can only be triggered in the first place by a device that has a gatt connection with the autoConnect parameter set to true.
em...@gmail.com <em...@gmail.com> #11
Yes, the last patch has fixed the issue. The first one should be marked as abandoned.
lk...@gmail.com <lk...@gmail.com> #12
The status of this bug says that it is not in production yet - even though it was merged into AOSP master on May 13, 2017.
Does this mean that both Android 6, 7.x and 8.x still have this problem?
Does this mean that both Android 6, 7.x and 8.x still have this problem?
is...@google.com <is...@google.com>
sa...@google.com <sa...@google.com> #13
Thank you for your feedback. We assure you that we are doing our best to address the issue reported, however our product team has shifted work priority that doesn't include this issue. For now, we will be closing the issue as won't fix obsolete. If this issue currently still exists, we request that you log a new issue along with latest bug report here https://goo.gl/TbMiIO .
Description
The bug occurs under the following conditions:
- The app has the autoConnect parameter in connectGatt set to true for all BLE connections.
- The BLE connection must be unstable enough to trigger supervision timeout. For example if the device is on the edge of being out of range.
- The Android device's supervision timeout timer times out slightly after the peripheral's times out (which is not strange because they restart their timeout timers when they successfully receive a packet from the other peer). After the peripheral times out, it is configured to immediately restart advertising. The Android device has not yet timed out (it will time out very soon).
- There is also a pending background connection to at least one other BLE device.
In this case, when the peripheral starts advertising, the Android device is still in the connected state to that device, since the supervision timeout has not played out yet. Since that device is currently in the Bluetooth controller's white list and there is a pending connection, the advertising event will be handled and the controller will connect and send a LE Connection Complete HCI event. At this point, Android's Bluetooth stack gets confused since this device is already connected, prints the error "L2CAP got BLE scanner conn_comp in bad state: 4" and then does NOT restart pending connections until Bluetooth is restarted (which leads to angry customers...).
This basically means no BLE connections will work until Bluetooth is restarted (however currently connected devices will work until their connection is lost, but not after that).
Note that disconnecting and then connecting again through the APIs also don't work (it will still never reconnect until you restart Bluetooth). Logcat prints "No such connection need to be cancelled" here.
The same bug will also be triggered if there is a "hacker" setting up a BLE advertiser with the same Bluetooth device address and starts advertising at the same time the device with that BD address is connected, and there is at least one pending connection to another BLE device. Another easy way to trigger the bug is to, when the peripheral is connected, remove the battery of the peripheral, immediately after reinsert it and start advertising (this must happen within the supervision timeout).
Note that this bug will not occur on Android devices with a Broadcom Bluetooth chip, since their firmware behaves in a non-conformant way. What happens there is that if it detects an incoming ADV_IND when a device with that BD address is already connected (if another connection is pending and this device is in the white list), it will immediately tear down the current connection and send a Disconnect event to the host indicating Connection Timeout (0x08), and thereafter send the LE Connection Complete event (for the new connection). Android devices with Qualcomm Bluetooth chips for example are however affected. I've tested and verified the bug both on a Oneplus One device with Marshmallow 6.0.1 (build MHC19Q) as well as an LG G3 with 6.0.1 (build MRA58K). Since Nexus and Samsung phones have Broadcom bluetooth chips, they are not affected.
I've included a hci snoop log. At packet 190, the timeout has occurred for the peripheral and it here restarts advertising and therefore the Android device connects. 10 ms later the Android device gets a "Disconnect complete" event from the controller. From now on nothing works anymore. At packet 297 the Android bluetooth stack by the way tries to cancel the pending connection but it fails with "Command Disallowed (0x0c)" because there is no pending connection to cancel. From that on it never sends any start/stop connection commands until you restart Bluetooth.
A suggested fix is to, when a connection has completed and it should continue searching for other background devices, don't start a pending connection immediately but first remove the just connected device from the white list. When a device has disconnected, stop the pending connection, add it back to the white list and restart the pending connection. Note that there are also upcoming new smarter BLE peripheral devices which can have multiple slave connections, meaning it will continue to advertise even though it is already connected to a master. To not fail with these devices this fix has to be done. Otherwise Android devices with Broadcom chips will just disconnect all the time and for other chips, all available connections will quickly be eaten up.
The bug seems to have been introduced in this commit:
Previously (on Lollipop and below), if it happened that the peripheral started to advertise during it was connected, i.e. usually right before the link was determined lost, no packages will be exchanged during that connection and an App will think the device is disconnected but once it disconnects and reconnects again which will usually happen eventually, it will work as normal again.
I've also included a logcat log where the bug triggers even though I don't think it adds anything important.