Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: writing to file using IOSink(add) is very slow #17951

Open
DartBot opened this issue Apr 1, 2014 · 19 comments
Open

performance: writing to file using IOSink(add) is very slow #17951

DartBot opened this issue Apr 1, 2014 · 19 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-io type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@DartBot
Copy link

DartBot commented Apr 1, 2014

This issue was originally filed by @tatumizer


Writing to file is almost 10 times slower than in java
  var sw = new Stopwatch()..start();
  
  IOSink out=new File("c:/temp/foo.txt").openWrite();
  out.done.then((v)=>print("total time: ${sw.elapsedMilliseconds}"));
  for (int i=0; i<4000; i++) {
    out.add(new Uint8List(65536));
  }
  out.close();

Takes 1813 ms in dart; same in java completes in 212 ms
The above code writes 256MB. I tested how it might be affected by size - reduced iterations from 4000 to 1000
Took 484ms in dart, 55 ms in java. Same ratio.

@sethladd
Copy link
Contributor

sethladd commented Apr 1, 2014

Added Area-IO, Triaged labels.

@andersjohnsen
Copy link

Set owner to @skabet.
Added Accepted label.

@andersjohnsen
Copy link

Tatumizer, can you attach the Java program for completeness? That would make it easier to understand the differences and why we appear to be that much slower.

I have a CL that fixes some of this, but not a factor of 8x.

Thanks.

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


I will send in a couple of hours, it's on my work comp.

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


Hi Anders,
To double-check, I tested on home comp (not nearly as fast as work comp - generic Dell for mass production)
Java program is
   static void writeFile() throws Exception {
        long start=System.currentTimeMillis();
        FileOutputStream fstr=new FileOutputStream("c:/temp/foo.txt");
        for (int i=0; i<4000; i++) {
            byte[] buf=new byte[65536];
            fstr.write(buf);
        }
        fstr.close();
    System.out.println(System.currentTimeMillis()-start);
    }

The ratio of results is more or less the same :
java: 397 ms
dart: 3258 ms

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


Maybe it's a windows-only phenomenon?

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


for IO requests, all the "action" occurs in the driver inside OS. Library just has to call correct OS function, and it should be more or less the same for all languages.

Because "add" in IOSink is anynchronous, there's additional optimization: library doesn't wait - it stores the buffer, and when IO interrupt allows writing, it writes. (At least, in principle it should be implemented like this). It can be faster, not slower, than java's OutputStream.

"asynchronous" functionality exists in java, too - in java.nio package.

@andersjohnsen
Copy link

tatumizer, I was able to run the program, with the following results (with my fix that was just landed):

File does not exist:

Java(Sync): 321ms
Dart(Sync): 264ms
Dart(Async): 1282ms

File exists:

Java(Sync): 1049ms
Dart(Sync): 933ms
Dart(Async): 2048ms

So, it's clear that async writing is slower. This is due to two things. Doing the copy for the writing isolate, and the extra delay in sending messages between isolates. My fix helper with the former.

But when comparing with sync code, Dart is actually a little bit faster than Java.

Also, there is a HUGE difference in if the file exists or not. Be sure you are testing the same.

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


The result I got from your program in dart are
Fastest: 4060
Fast : 6251
Slow : 9667
It will be like this in java or any language. Writing N blocks at once is faster than N times writing 1 block. Mezoni, please learn how HW works before insulting.

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


Anders, sorry, prev. post was directed to mezoni.
Wait a sec, I will try to make sense of your results

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


Anders,
I can't find ANY difference in java for the case where file exists or doesn't.
Maybe it depends on OS or HW or something, but intuitively, it doesn't make any sense.
When we write to "existing" file, it gets kind-of deleted anyway - so new data in general will be stored in different locations on disk - though even that makes no difference whatsoever.
Are you sure you run the same test? Maybe your file is 4 times bigger in one case?
Anyway, my timing is absolutely the same in java

@andersjohnsen
Copy link

Interesting, this is on Linux. I'll try out on Windows, when I get a chance.

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


What affects speed of writing is: fragmentation of disk. And of course the
strategy of block distribution implemented by OS

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


Anders,
The reason I brought up async operations is that in dart, there's no parity between random access and stream files. You just can't use sync operations on stream files - no such thing.
And in all popular benchmarks, output is written into standard out. I don't know any way to write to standard out using random access files.

Mezoni: my apologies. You are a good guy. Just a bit rude. You have to learn, if not HW, then good manners.

@DartBot
Copy link
Author

DartBot commented Apr 2, 2014

This comment was originally written by @tatumizer


Mezoni: if block size is not multiple of 65536 (e.g. 8192), then java works 1.5-2 times faster on my comp on bigger blocks. It depends on too many factors (fragmentation, OS, block size) - no way to compare "objectively".

@DartBot
Copy link
Author

DartBot commented Apr 3, 2014

This comment was originally written by @tatumizer


Anders: the mystery about speed of writes on Windows can be resolved by this:
http://support.microsoft.com/kb/324805

It caches writes by default!
The speed of writes on Windows was a bit fishy to me to begin with. Physically, it can't do it as fast as benchmarks show. SCSI, of course, is faster, but still...
It's caching! Read is always cached by default, that's a matter of course. That write is cached, too, is not that obvious - it can lead to loss of data. Article above explains that.
 

@andersjohnsen
Copy link

Ah, yes, that's quite common. I'm sure it happens on Linux as well. The extra cost probably comes from actual HD activity, where we start out by deleting the existing file.

@DartBot
Copy link
Author

DartBot commented Apr 9, 2014

This comment was originally written by @tatumizer


Anders: turns out, the issue is more complicated. I just tested writing to "nul" device on Windows. It shouldn't depend on any properties of hardware, of whether the file is new or old - there's no file. Data is just discarded.
For the same 256MB of data, java completes in 73 ms, and dart ... 1760 ms!!!
I'm using Dart SDK version 1.3.0-dev.7.12 - not sure your latest fixes are there, but the difference in timing should be explained somehow.

@andersjohnsen
Copy link

tatumizer, can you clarify what you are comparing. Are you comparing IOSink with synchronous Java?

It's important that we fully understand what async writing means. Doing async writing will do the exact same as synchronous, except it'll copy the data to another isolate (thread) and let that isolate perform the action. Once done, it'll notify the isolate that issued the write. It's very obvious that writing has a higher overhead when async - that can not change (though we are trying to minimize the overhead). However, what it allows is to not block the isolate issuing the write. This is very important for programs where we have many simultaneous operations, e.g. a HTTP server.

If the results you have (73 vs 1760) is gather from the two programs show in this isolate, we are comparing oranges and apples.

@kevmoo kevmoo added the area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. label Jun 16, 2015
@kevmoo kevmoo added type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) and removed triaged labels Mar 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-io type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

4 participants