Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

_OverlappedFuture.set_result() should clear the its reference to the overlapped object #196

Closed
GoogleCodeExporter opened this issue Apr 10, 2015 · 8 comments

Comments

@GoogleCodeExporter
Copy link

Hi,

I rewrote the Windows code handling overlapped operations to fix a crash when a 
_WaitHandleFuture was cancelled, to log errors and more generally to cleanup 
the code.

I tried to clear the reference the overlapped operation in _OverlappedFuture. 
_OverlappedFuture.set_exception() now cancels the operation and then clears the 
reference, it works. _OverlappedFuture.cancel() does the same and it works too.

But when I try to clear the reference to the overlapped operation in 
_OverlappedFuture.set_result(), I get random errors. It looks the problem only 
occurs in the fast-path of IocpProactor._register(), when the overlapped is not 
added to _cache. It looks like clearing the reference in _poll(), when 
GetQueuedCompletionStatus() got the status of the operation, it works.

I tried to add the overlapped into _cache but then the event loop hangs or 
raises other errors.

I don't see how to fix the issue. There is a bug somewhere or I misunderstood 
how overlapped operations works. Is it important that 
GetQueuedCompletionStatus() reads the status of the operation? Or should we 
*cancel* the operation after getting the result?

It's really strange because in _register(), the overlapped is not added to 
_cache is the overlapped is not pending. But I get errors in the overlapped 
destructor. According to the destructor code, I'm not supposed to get errors if 
the overlapped already completed.

Maybe I looked at the wrong overlapped object. I don't know.

Original issue reported on code.google.com by victor.s...@gmail.com on 25 Jul 2014 at 11:16

@GoogleCodeExporter
Copy link
Author

Original comment by gvanrossum@gmail.com on 25 Jul 2014 at 3:13

@GoogleCodeExporter
Copy link
Author

To reproduce the issue, apply attached patch (send.patch) and run:

    c:\python33\python.exe runtests.py test_pipe

Output1:

C:\haypo\tulip>\python33\python.exe runtests.py test_pipe
Skipping 'test_unix_events': UNIX only
Exception RuntimeError: '<_overlapped.Overlapped object at 0x0000000003477270> s
till has pending operation at deallocation, the process may crash' ignored
Exception RuntimeError: '<_overlapped.Overlapped object at 0x0000000003477150> s
till has pending operation at deallocation, the process may crash' ignored
Exception RuntimeError: '<_overlapped.Overlapped object at 0x00000000034778A0> s
till has pending operation at deallocation, the process may crash' ignored
Exception RuntimeError: '<_overlapped.Overlapped object at 0x00000000034770C0> s
till has pending operation at deallocation, the process may crash' ignored
...
----------------------------------------------------------------------
Ran 3 tests in 0.031s


Output 2:

C:\haypo\tulip>\python33\python.exe runtests.py test_pipe
Skipping 'test_unix_events': UNIX only
Exception RuntimeError: '<_overlapped.Overlapped object at 0x00000000035E7390> s
till has pending operation at deallocation, the process may crash' ignored
Exception RuntimeError: '<_overlapped.Overlapped object at 0x00000000035E7660> s
till has pending operation at deallocation, the process may crash' ignored
Fatal read error on pipe transport
protocol: <asyncio.streams.StreamReaderProtocol object at 0x00000000035EE668>
transport: <_ProactorDuplexPipeTransport fd=348>
Traceback (most recent call last):
  File "C:\haypo\tulip\asyncio\proactor_events.py", line 159, in _loop_reading
    data = fut.result()  # deliver data later in "finally" clause
  File "C:\haypo\tulip\asyncio\futures.py", line 271, in result
    raise self._exception
  File "C:\haypo\tulip\asyncio\windows_events.py", line 488, in _poll
    value = callback(transferred, key, ov)
  File "C:\haypo\tulip\asyncio\windows_events.py", line 279, in finish_recv
    return ov.getresult()
OSError: [WinError 996] L\u2019événement d\u2019entrée/sortie avec 
chevauchement
 n\u2019est pas dans un état signalé
...
----------------------------------------------------------------------
Ran 3 tests in 0.016s

If you run test_windows_events.py in debug mode (PYTHONASYNCIODEBUG=1), the 
test can fail:

C:\haypo\tulip>\python33\python.exe tests\test_windows_events.py
..Exception RuntimeError: '<_overlapped.Overlapped object at 0x0000000003386270>
 still has pending operation at deallocation, the process may crash' ignored
Exception RuntimeError: '<_overlapped.Overlapped object at 0x0000000003386150> s
till has pending operation at deallocation, the process may crash' ignored
Exception RuntimeError: '<_overlapped.Overlapped object at 0x00000000033864B0> s
till has pending operation at deallocation, the process may crash' ignored
Fatal read error on pipe transport
protocol: <asyncio.streams.StreamReaderProtocol object at 0x0000000003388400>
transport: <_ProactorDuplexPipeTransport fd=512>
Traceback (most recent call last):
  File "C:\haypo\tulip\asyncio\proactor_events.py", line 159, in _loop_reading
    data = fut.result()  # deliver data later in "finally" clause
  File "C:\haypo\tulip\asyncio\futures.py", line 271, in result
    raise self._exception
  File "C:\haypo\tulip\asyncio\windows_events.py", line 491, in _poll
    value = callback(transferred, key, ov)
  File "C:\haypo\tulip\asyncio\windows_events.py", line 279, in finish_recv
    return ov.getresult()
OSError: [WinError 996] L\u2019événement d\u2019entrée/sortie avec 
chevauchement
 n\u2019est pas dans un état signalé
E..
======================================================================
ERROR: test_pipe (__main__.ProactorTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests\test_windows_events.py", line 53, in test_pipe
    res = self.loop.run_until_complete(self._test_pipe())
  File "C:\haypo\tulip\asyncio\base_events.py", line 268, in run_until_complete
    return future.result()
  File "C:\haypo\tulip\asyncio\futures.py", line 271, in result
    raise self._exception
  File "C:\haypo\tulip\asyncio\tasks.py", line 245, in _step
    result = coro.send(value)
  File "tests\test_windows_events.py", line 81, in _test_pipe
    response = yield from r.readline()
  File "C:\haypo\tulip\asyncio\coroutines.py", line 79, in __next__
    return next(self.gen)
  File "C:\haypo\tulip\asyncio\streams.py", line 391, in readline
    raise self._exception
  File "C:\haypo\tulip\asyncio\proactor_events.py", line 159, in _loop_reading
    data = fut.result()  # deliver data later in "finally" clause
  File "C:\haypo\tulip\asyncio\futures.py", line 271, in result
    raise self._exception
  File "C:\haypo\tulip\asyncio\windows_events.py", line 491, in _poll
    value = callback(transferred, key, ov)
  File "C:\haypo\tulip\asyncio\windows_events.py", line 279, in finish_recv
    return ov.getresult()
OSError: [WinError 996] L\u2019événement d\u2019entrée/sortie avec 
chevauchement
 n\u2019est pas dans un état signalé

----------------------------------------------------------------------
Ran 5 tests in 0.624s

FAILED (errors=1)

Original comment by victor.s...@gmail.com on 25 Jul 2014 at 8:11

Attachments:

@GoogleCodeExporter
Copy link
Author

I analyzed an overlapped WSARecv() which has been cancelled. Just after calling 
CancelIoEx(), HasOverlappedIoCompleted() returns 0.

Even after GetQueuedCompletionStatus() returned the address of the overlapped, 
HasOverlappedIoCompleted() still returns 0.

We need a way to check if an overlapped operation was cancelled. I'm digging 
the MSDN documentation. But if it's not possible, we will have to add a flag in 
_overlapped.Overlapped structure.

Well, I don't know if it would fix all issues.

Original comment by victor.s...@gmail.com on 25 Jul 2014 at 9:55

@GoogleCodeExporter
Copy link
Author

Ok, I found the main issue: sometimes GetQueuedCompletionStatus() returns an 
overlapped operation which has not completed yet. I modified 
IocpProactor._poll() to ignore the false alarm.

I pushed a fix in Tulip, Python 3.4 and 3.5.

I keep the issue open, because I would like understand why we start getting 
false alarms after clearing references. Is it linked? And would it be possible 
to limit these false alarms? ProactorIocp._register_with_iocp() contains this 
comment:

    # XXX We could also use SetFileCompletionNotificationModes()
    # to avoid sending notifications to completion port of ops
    # that succeed immediately.

Original comment by victor.s...@gmail.com on 25 Jul 2014 at 11:25

@GoogleCodeExporter
Copy link
Author

Sorry I did not respond earlier.

> I analyzed an overlapped WSARecv() which has been cancelled. Just after 
calling CancelIoEx(),
> HasOverlappedIoCompleted() returns 0.

That is expected.  CancelIoEx() only requests the operation to stop -- you 
still need to wait for it to complete.

> Even after GetQueuedCompletionStatus() returned the address of the 
overlapped, HasOverlappedIoCompleted()
> still returns 0.
>
> We need a way to check if an overlapped operation was cancelled. I'm digging 
the MSDN documentation. 
> But if it's not possible, we will have to add a flag in 
_overlapped.Overlapped structure.

Maybe HasOverlappedIoCompleted() is unreliable/racy.

I *think* GetQueuedCompletionStatus() and GetOverlappedResult() both indicate 
cancellation using ERROR_OPERATION_ABORTED.

You can check the first "err" element of the tuple returned by 
_overlapped.GetQueuedCompletionStatus().
Also ov.getresult() should raise WindowsError(ERROR_OPERATION_ABORTED).

> Ok, I found the main issue: sometimes GetQueuedCompletionStatus() returns an 
overlapped operation 
> which has not completed yet.

I would like to know what ov.getresult(wait=False) returns/raises in such 
situations.  I need to take a proper look when I have access to a Windows 
computer.

(BTW which version of Windows are you using?)

Original comment by roudk...@google.com on 29 Jul 2014 at 9:33

@GoogleCodeExporter
Copy link
Author

Oh sorry, I forgot to update this issue after my new changes. Let's try to 
summarize.

Before, the _register() method didn't add the overlapped operation to _cache if 
the operation was already complete. In this case, only the future kept a 
reference the OVERLAPPED object.

It is not reliable because the future may be destroyed and so the memory of 
OVERLAPPED object is freed (and may be overriden by new bytes) before the IOCP 
is notified of the completion. In this case, bad things happen. For example, I 
saw "false alarms" and other strange behaviour.

I rewrote how IocpProactor handles references. _register() now always hold a 
reference to the Python overlapped object (which contains the OVERLAPPED 
structure) even if the operation already completed and the future is already 
done. I did that to ensure that OVERLAPPED structure is not freed before 
GetQueuedCompletionStatus() notify the completion.

I also changed _OverlappedFuture to clear its reference to the overlapped 
object on cancel(), set_result() and set_exception().

There is still an "hack" in accept_pipe(): I call _register() with 
register=False because it looks like GetQueuedCompletionStatus() never notify 
the completion if the overlapped already completed when _register() is called. 
IMO this is a bug which should be fixd.


Other changes:

- _OverlappedFuture.set_exception() cancels the overlapped, as 
_OverlappedFuture.cancel()
- _WaitHandleFuture.cancel() completes the overlapped operation to wake up the 
IocpProactor event loop
- _WaitHandleFuture.set_result() and WaitHandleFuture.set_exception() are now 
responsible to unregister the wait
- IocpProactor.wait_for_handle() immediatly set the result of the future if the 
overlapped already completed
- IocpProactor.wait_for_handle() sets the object to 0 instead of None to *not* 
clear its entry in close() and use the normal code to handle cancel and wait 
overlapped operations. I don't want special cases.


Overlapped operations on Windows are very complex. IMO the asyncio code is very 
well written, great job! It's not easy to pick the right flag, ensure that 
functions are called in the right order, some features are deprecated, etc. It 
looks like asyncio is up to date and use the right flags.

I just tried to make it even more reliable.

Original comment by victor.s...@gmail.com on 29 Jul 2014 at 9:48

@GoogleCodeExporter
Copy link
Author

> Before, the _register() method didn't add the overlapped operation to _cache 
if the operation was already
> complete. In this case, only the future kept a reference the OVERLAPPED 
object.
>
> It is not reliable because the future may be destroyed and so the memory of 
OVERLAPPED object is freed (and may
> be overriden by new bytes) before the IOCP is notified of the completion. In 
this case, bad things happen. For
> example, I saw "false alarms" and other strange behaviour.

Ah.  I think that "fast path" was necessary when using 
SetFileCompletionNotificationModes().  (I used that in an earlier version, but 
removed it because it was not available on WinXP.)

But as you say, an overlapped address returned by GetQueuedCompletionStatus() 
may refer to an old garbage collected object which shares its address which a 
currently live overlapped object -- causing bad things to happen.

Original comment by roudk...@google.com on 30 Jul 2014 at 10:11

@GoogleCodeExporter
Copy link
Author

The initial issue "_OverlappedFuture.set_result() should clear its reference to 
the overlapped object" is fixed, so I close this issue.

I opened the new issue #204 to investigate accept_pipe() special case.

Original comment by victor.s...@gmail.com on 25 Aug 2014 at 9:47

  • Changed state: Fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant