New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance: transform #17659
Comments
Added Library-Convert label. |
Added Triaged label. |
Added Area-Library label. |
Proposed CL: https://codereview.chromium.org/208693008/ |
This comment was originally written by @tatumizer fixing LATIN1 only doesn't help much, because people mechanically specify UTF8 as encoding. |
About LATIN1, I don't agree:
As for the other encodings, they are already implemented using a quite optimized, and somewhat similar, strategy - though the number of features we support adds a bit of overhead. You are of cause more than welcome to propose patches, if you find ways to optimize the code! Cheers, Anders |
This comment was originally written by @tatumizer Look at benchmarks I included in the bug report: they are horrible! will they be the same after the fix for UTF8? These results show no trace of fast track for all ASCII chars. Fast track should kick in right away, until proven wrong. Benchmarks show more work is needed there. Up to you though. There'are sites that compare performance, people look into them, and dart will not have good showing there, and not because compiler is slow. |
This comment was originally written by @tatumizer Also, your remark about LATIN1 being default misses one point: people don't know about it. Go to any example on Internet (e.g. on stackoverflow) - my bet that 99% of then spefify UTF8 explicitly, no matter what language. People just copy these examples without thinking. Why create another performance pitfall, for no reason? |
I think there may be some misunderstandings here. I absolutely acknowledge that the issue is valid and that performance is very bad here. About Latin1; I'm in no way advocating the usage of Latin1. However, if you don't copy code, and write a simple HTTP server yourself, it's very likely that the encoding will be Latin1. And the user should not be punished by bad performance, by simply not considering encodings. Now, the proposed CL does not only improve Latin1, but also ASCII and UTF8, by a factor of 2x-3x in the case of pure ASCII characters, as far as I remember. This is not the only thing that can be improved here, and the issue should not be closed until the performance is as expected; very fast. |
This comment was originally written by @tatumizer by factor of 2x-3x? then it's ok. Not OK, but ok. |
There are no other obvious fixes for the current implementation. A believe a much more thoroughly rewrite is needed to further improve the performance. Removed the owner. |
This comment was originally written by @tatumizer That's fine, it works much faster now; in one case (ASCII) even as good as |
Sounds like the performance has reached an acceptable level. Added Fixed label. |
This comment was originally written by @tatumizer Yes, the overhead is reduced to minimum now. (tested on file containing ASCII only). |
This issue was originally filed by @tatumizer
Please create a big file (in my case, it's about 250MB) and run the following:
import 'dart:convert';
import 'dart:io';
main() {$chunks, time: $ {sw.elapsedMilliseconds}");
test(10);
}
test(n) {
if (n==0) return;
var sw = new Stopwatch()..start();
int chunks=0;
new File("c:/temp/reverse-complement-huge.txt").openRead()
.transform(ASCII.decoder)
.listen((d){chunks++;},
onDone: () {
print("chunks:
test(n-1);
});
}
I ran 10 iterations to be sure it's not an issue of warmup.
(In fact, there's not much difference between iterations)
Best results for different encodings (tested on relatively decent Intel processor 2.7GHz, Windows 7)
LATIN1: chunks: 3973, time: 6852
UTF=8 chunks: 3973, time: 3420
ASCII chunks: 3973, time: 3941
no transform at all: chunks: 3973, time: 274
LATIN1, which was supposed to be identical transform, takes 25 times longer than reading from disk.
The text was updated successfully, but these errors were encountered: