Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory.list should honor subscription cancel. #10163

Closed
kevmoo opened this issue Apr 23, 2013 · 19 comments
Closed

Directory.list should honor subscription cancel. #10163

kevmoo opened this issue Apr 23, 2013 · 19 comments
Assignees
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-io os-linux os-osx
Milestone

Comments

@kevmoo
Copy link
Member

kevmoo commented Apr 23, 2013

Sorry for the long repo, but I've verified this twice.

  1. Clone https://github.com/kevmoo/bot_io.dart
  2. run dart test/harness_console.dart

Always (seems) to work fine on mac.
Often (always?) crashes on Linux

See result here
https://drone.io/github.com/kevmoo/bot_io.dart/9

The call that crashes is the call to

tempDir.dispose()
https://github.com/kevmoo/bot_io.dart/blob/master/test/bot_io/temp_dir_tests.dart#L107

Which calls
dir.deleteSync(recursive: true);
https://github.com/kevmoo/bot_io.dart/blob/master/lib/src/bot_io/temp_dir.dart#L40

There error references an inability to open a file in the deleted directory.
The failure happens asynchronously

FileIOException: Cannot open file '/tmp/temp_dir1_Iy2nRn/dir1/dir1_file1.txt' (OS Error: No such file or directory, errno = 2)

@kevmoo
Copy link
Member Author

kevmoo commented Apr 23, 2013

Verified on Ubuntu 12.04.2 LTS, Precise Pangolin

Once on Drone.io, once via VM ware on a mac.

Never causes problems running directly on mac.

@sgjesse
Copy link
Contributor

sgjesse commented Apr 24, 2013

Could you please explain what you mean when saying "The failure happens asynchronously"? The call to "dir.deleteSync(recursive: true)" is synchronous so it should also throw an exception. Do you have async calls still using the temp directory which run after the deletion? Could there be more isolates using the tmp dir at the same time?

@kevmoo
Copy link
Member Author

kevmoo commented Apr 24, 2013

I could be getting this wrong, but I think that the call to deleteSync is doing something async. The error does not bubble up the call stack on the deleteSync call, but does cause a crash later.

Let be double check that, though...

@sgjesse
Copy link
Contributor

sgjesse commented Apr 24, 2013

The deleteSync call does nothing async by itself. It calls a function in C code which does not return until it has finished. The C code enumerates all files and directories to delete and deletes them one by one. If there is an error deleting the file or directory an exception is throws.

@kevmoo
Copy link
Member Author

kevmoo commented Apr 24, 2013

I've updated the code to use delete() (the async version)

Still getting the crash.

After adding debug info to the code, it seems the crash happens at the next call to Directory.createTemp()

I've tried to create a minimal repo with no luck, but I'm still getting the crash on my local Ubuntu VM and on drone.io

https://drone.io/github.com/kevmoo/bot_io.dart/11

Very weird, but very reproducible. Only on Linux. Fine on mac.

@kevmoo
Copy link
Member Author

kevmoo commented Apr 29, 2013

Any thoughts on this? I have another solid repro on r22072

https://drone.io/github.com/kevmoo/bot_io.dart/12

Happy to help debug.

@whesse
Copy link
Member

whesse commented May 6, 2013

Set owner to @whesse.
Added Started label.

@whesse
Copy link
Member

whesse commented May 6, 2013

I have tried to reproduce with the 22072 SDK and the Apr 24 version of bot_io, but it does not reproduce. Is this on the 32-bit or 64-bit SDK?

The error looks to me as if two calls to Directory.tempDir are returning the same directory. This would cause the error you see, when a second test tries populating the directory with files after the first test has deleted it. If you can reproduce it, could you print out the directory names returned from Directory.tempDir?

If anything could produce a race condition in the VM, it would be this test, which calls tempDir simultaneously multiple times on the same Directory('') object, cached in a static variable of TempDir.

I think that all of the asynchronous tests are running simultaneously, not one after the other, so all the calls to tempDir happen at once. Once the futures return, then the tests happen one at a time, perhaps interleaving at each future call.

The error looks like it is happening when creating the file, not when deleting it, so you could check that it happens. I would put a check in EntityPopulator._populateFileWithStream, just before openWrite, that the directory of the file exists, because I think it is exactly there that the problem is. It could also be when it is opened for checking, in _getFileSha1.


Added NeedsInfo label.

@whesse
Copy link
Member

whesse commented May 6, 2013

Also, notice that the error, an unhandled exception, should be able to be caught by putting an error handler on the Future or Stream that is generating it. In the tests, all the futures should be chained together, so that the exception should be catchable on the final future result of the last .then, and then it can be traced back by binary search.

@andersjohnsen
Copy link

Hi,

I've tried out your code, and found that there is indeed a bug in our code, but it's not as expected. When a directory-listing is performed, we list all the files, even if the stream is canceled. That means that if one uses:

var isEmptyFuture = dir.list().isEmpty;

the listing will continue after the isEmpty future was completed.

That's the problem seen on line 13 of bin/packages/bot_io/src/bot_io/io_helpers.dart. Changing it to

            return EntityValidator.validateDirectoryFromMap(dir, content)
                .length.then((l) => l == 0);

solves the issue, at it wait for the stream to complete.

However, the listing should honor the stream cancel, so it's not a bug in your code(!), this is just a workaround.


Removed the owner.
Added Accepted label.
Changed the title to: "Directory.list should honor subscription cancel.".

@andersjohnsen
Copy link

Set owner to @skabet.
Added Started label.

@kevmoo
Copy link
Member Author

kevmoo commented Jul 11, 2013

FYI: this seems to be back and reproducing on Mac

Dart VM version: 0.6.3.3 r24898 (Thu Jul 11 07:47:10 2013) on "macos_x64"

@kevmoo
Copy link
Member Author

kevmoo commented Jul 11, 2013

Also wanted to confirm: your work-around does address the problem. Tests are passing now on my side.

@kevmoo
Copy link
Member Author

kevmoo commented Jul 12, 2013

I see a code review https://codereview.chromium.org/16813006
And commit r24074

...but it seems to now exist on OSX, where I was only seeing it on Linux before


Added OpSys-OSX label.

@andersjohnsen
Copy link

Yes, it's an issue on how the Stream/StreamController is implemented. Atm, when you cancel a StreamSubscription (e.g. isEmpty will do that once it sees the first event), it'll return immediate after the cancel call. However, in this case, stopping an async directory listing is not a sync call, leaving us no way to delay the isEmpty completion.

What I'm working on now is to first address this issue on the Stream itself, and then fix this issue.

@sgjesse
Copy link
Contributor

sgjesse commented Jul 24, 2013

Added this to the M6 milestone.

@larsbak
Copy link

larsbak commented Aug 28, 2013

Removed this from the M6 milestone.
Added this to the M7 milestone.

@sgjesse
Copy link
Contributor

sgjesse commented Sep 25, 2013

Removed this from the M7 milestone.
Added this to the M8 milestone.

@andersjohnsen
Copy link

This was fixed through a series of changes, the last one being r29454.

Note that this also fixed an issues with leaking FSs/HANDLEs when the listing was canceled.


Added Fixed label.

@kevmoo kevmoo added Type-Defect os-linux os-osx area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-io labels Oct 29, 2013
@kevmoo kevmoo added this to the M8 milestone Oct 29, 2013
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-io os-linux os-osx
Projects
None yet
Development

No branches or pull requests

5 participants