New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SnappyLoader takes minutes to initialize #39
Comments
Any suggestion to a utility method for comparing two streams? |
This might be a good start: |
Thanks. I will try NIO based implementation. |
The NIO implementation is not a very good one, as it requires moving data back and forth from jvm to native memory multiple times. It would be better if you used a standard ByteBuffer, rather than a DirectByteBuffer (as that avoids the crossing jvm/native memory). |
I think it's the other way around - using direct ByteBuffers allows the OS to perform the I/O directly into the same memory space that is then accessed by the Java code, whereas with non-direct buffers the I/O is first done in some other native buffer and then copied into a Java array. http://stackoverflow.com/questions/5670862/bytebuffer-allocate-vs-bytebuffer-allocatedirect is the first link I found on this, and confirms it. You can also make the buffer a bit larger than 1K - probably not much difference in performance, but still most systems nowadays read from disk in at least 4K blocks anyway, so might as well pass it through like that. In any case, the performance difference between the two buffer types in this case is likely negligible, and both should be several orders of magnitude better than the current state of affairs. |
That is only true if you are reading directly from a
|
Basically there are 2 really good uses for direct byte buffers and several
The edge uses are usually around addressing very large chunks of memory. If all of the data is examined within the jvm (and there is no later jni
|
I just created a snapshot version that simply compares two InputStreams. @amichair |
Sometimes SnappyLoader takes several minutes to initialize (in my case it happens when connecting via a remote debugger to a process which uses snappy via some transitive dependency).
Specifically, it's SnappyLoader.md5sum() that takes ages to complete. I suspect the problem might be with calling digestInputStream.read(), i.e. reading the stream and updating the digest one byte at a time - it is far more efficient to be working with a buffer (even a small 4K buffer will do) and reading it in one fell swoop.
Or, at a higher level, there's actually no reason to use an md5 digest to compare two streams - it would be more efficient and straightforward to just compare the content of the streams directly for equality, with no digest or other calculations (but here too it should be done using a buffer, not reading them byte by byte and comparing). There are plenty of such stream equality utility methods to be found, so no need to write it from scratch either.
The text was updated successfully, but these errors were encountered: