Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion is slow for numbers with no more than 17 non-zero bits after the binary point. #28

Open
jayfoad opened this issue Feb 12, 2016 · 5 comments

Comments

@jayfoad
Copy link

jayfoad commented Feb 12, 2016

I've noticed a fairly large class of numbers for which double-conversion is slow to produce the shortest decimal representation. Here's a test case:

#include "double-conversion.h"
using namespace double_conversion;

int main()
{
    size_t i = 0;
    for (int j = 0; j < 5e6; j++)
    {
#ifdef FAST
        double d = (rand() % 1000000) / 100000.0; // 10^5
#else
        double d = (rand() % 1000000) / 131072.0; // 2^17
#endif
        char buf[20];
        bool s;
        int length, decpt;
        DoubleToStringConverter::DoubleToAscii(d, DoubleToStringConverter::SHORTEST, 0, buf, sizeof buf, &s, &length, &decpt);
        i += strlen(buf);
    }
    return i % 1000;
}

On my machine (Ubuntu 15.10, Haswell CPU) I get:

$ g++ -O3 *.cc -o foo -DFAST && time ./foo
real    0m0.446s
user    0m0.444s
sys 0m0.000s
$ g++ -O3 *.cc -o foo && time ./foo
real    0m2.402s
user    0m2.400s
sys 0m0.000s

So for random integers divided by 2^17 it's running 6x slower than for some other random inputs. Profiling shows that this is because FastDtoa() is usually (always?) failing, so it has to fall back to BignumDtoa(). Is this expected? Is there a way to make FastDtoa() cope with more cases?

Incidentally, I noticed this when benchmarking our software (which now uses double-conversion) against a previous version that used David Gay's dtoa(), using his mode 4 to get the shortest representation but limited to 10 digits. double-conversion doesn't seem to have an equivalent of mode 4, so we're forced to use SHORTEST mode, which makes double-conversion appear slower than dtoa() on random integers / 2^17.

@floitschG
Copy link
Contributor

Normally there should only be ~0.5% of all doubles that bail out to slow-mode, and it shouldn't be because of a 2^17 division.
The factor 6 seems strange, so maybe there is a bug in the implementation. I will try to have a look at it.

With respect to mode 4: I would like to avoid adding more API surface to the library, but if you are just interested in fast conversions (not necessarily caring for optimal results) I could send you pointers to different implementations. In particular, there exists an implementation that never falls back to bignum-mode, but isn't necessarily optimal (that is, could give you longer than necessary numbers).

Here are the sources: http://florian.loitsch.com/publications/bench.tar.gz
You would be interested in Grisu or Grisu2.

Grisu just prints 17 digits as fast as possible and is generally the fastest you can get.
Grisu2 shortens, but sometimes produces more digits than necessary (yet still providing a correct output).
(Grisu3 is the one that is used in this library here. You could try that one too, and see if it behaves differently).

@jayfoad
Copy link
Author

jayfoad commented Feb 15, 2016

Thanks for the links. I need correct conversions but I want them fast, so I may end up patching double-conversion to provide something like dtoa's mode 4.

I may look at speeding up BignumDtoa too, perhaps by using 64-bit limbs on 64-bit hosts.

@floitschG
Copy link
Contributor

Apparently I misunderstood your explanation of dtoa's mode 4. I thought it would at most emit 10 digits (which would obviously lead to bad output for the numbers that need more than 10).

Grisu and Grisu2 both produce correct numbers. They are just sometimes not the shortest. From my understanding, that's pretty close to mode 4. If you don't care for the length of the output, Grisu is the fastest. If you prefer nicer output (but can live with longer ones) Grisu2 is the next best choice.

I still need to find some time to look into the perf-issues for x/2^17 numbers.

@jayfoad
Copy link
Author

jayfoad commented Feb 17, 2016

Apparently I misunderstood your explanation of dtoa's mode 4. I thought it would at most emit 10 digits (which would obviously lead to bad output for the numbers that need more than 10).

You didn't misunderstand. That's correct.

When I said I need correct conversions, I meant that I need to get results that are identical to the results I used to get from dtoa's mode 4. Currently I am achieving this by calling DoubleToAscii but massaging the result if it's longer than 10 digits.

@floitsch
Copy link
Collaborator

I started to look at this issue. The problem is, that double d = (rand() % 1000000) / 131072.0; tends to produce numbers that have a decimal representation that ends with a 5.
Out of 100 randomly generated numbers here are the 24 numbers that needed to fall back to bignum operations:

2.207817077636718750000000000000
1.818351745605468750000000000000
3.941719055175781250000000000000
3.738609313964843750000000000000
3.967735290527343750000000000000
1.328025817871093750000000000000
3.920967102050781250000000000000
1.015235900878906250000000000000
1.335227966308593750000000000000
1.344520568847656250000000000000
2.879127502441406250000000000000
3.695838928222656250000000000000
1.845344543457031250000000000000
3.793952941894531250000000000000
3.211402893066406250000000000000
2.565971374511718750000000000000
0.965156555175781250000000000000
2.700004577636718750000000000000
0.767097473144531250000000000000
1.780448913574218750000000000000
2.624839782714843750000000000000
1.305290222167968750000000000000
3.834922790527343750000000000000

It's important to see that the last 5 digit is not necessary anymore, and that the rounded number is still enough to correctly print the number. For example, 2.20781707763671875 should be printed as 2.2078170776367188, but 2.2078170776367187 is at the same distance.
Grisu realizes that both outputs are valid, but assumes that its operations were imprecise and therefore can't chose between the two. (It should round to the even last digit).
If you don't care for the exact direction into which correct numbers are rounded, you can disable that check here (line 153):

if (rest < big_distance &&

Removing that 'if' will still produce valid output, but doesn't care if it choses the closest if several decimal numbers read back to the same double.

Note that Grisu2 would not have any problems with the number: it would just pick one of the two since it doesn't stop if the output is not optimal (but still correct).

I still need to figure out, if there is a good way to avoid the bailout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants