fastfunlib


Fast multiprecision transcendental functions

A C library for fast multiprecision evaluation of transcendental functions, using fixed-point arithmetic directly on top of GMP/MPIR. The goal is to provide optimized base implementations of transcendental functions, with minimal overhead at low precision as well as asymptotic speed. It should be usable directly for computations that work well in fixed point (i.e. involving both inputs and outputs of order of magnitude close to unity), or as a base library for implementation of floating-point arithmetic with bells and whistles.

Current status

The current target is to test various algorithms, mainly for elementary functions. There is no code with a usable library interface yet. The svn trunk contains test implementations for exp (the same code may also compute cos or sin), log, and gamma.

Exp

The test program computes exp(0.37) and compares both accuracy and speed to MPFR-2.4.1. This implementation is uniformly faster than MPFR up to at least several hundred bits of precision on one AMD64 system. Accuracy is within 1-2 bits of the correctly rounded value (as computed by MPFR). Times (5th and 6th columns) are in nanoseconds.

fredrik@airy:~/src/fastfunlib/fastfunlib/exptest$ ./exptest prec acc J r mpfr this faster 53 52 1 2 5070 1240 4.089 66 65 1 6 5200 1390 3.741 82 81 1 8 5590 1460 3.829 102 101 1 6 5810 1670 3.479 127 126 2 8 7170 2280 3.145 158 157 2 7 8390 2500 3.356 197 196 1 11 8610 3030 2.842 246 246 1 7 9870 3900 2.531 307 307 2 12 12260 4440 2.761 383 381 3 9 12860 5220 2.464 478 477 2 11 16260 6580 2.471 597 596 2 14 19220 8039 2.391 746 746 3 13 23400 10500 2.229 932 931 3 8 27900 14100 1.979 1165 1164 3 14 35400 18700 1.893 1456 1455 3 14 48500 27000 1.796 1820 1819 3 12 67000 38000 1.763 2275 2274 4 11 93000 56500 1.646 2843 2842 4 17 130500 84000 1.554 3553 3553 5 14 190500 124000 1.536 4441 4440 6 14 278000 187000 1.487 5551 5550 6 17 400500 279500 1.433 6938 6937 6 19 599000 429500 1.395 8672 8670 7 18 891000 645000 1.381 10840 10839 7 25 1310000 973000 1.346 13550 13549 8 23 2024500 1472000 1.375 16937 16935 9 25 3121000 2299000 1.358 21171 21170 9 27 4647000 3446000 1.349 26463 26461 9 36 5335000 5209500 1.024

The J and r parameters are algorithmic tuning parameters. The test program performs an exhaustive search and reports the best combination. Eventually a program for pre-tuning will have to be written.

This is still a naive implementation and optimistically the speed at low precision could be improved by perhaps an additional factor two.

Log

Logarithm using Taylor series and argument reduction is twice as fast as MPFR up to a few hundred bits:

fredrik@airy:~/src/fastfunlib/fastfunlib/logtest$ ./logtest prec acc J r mpfr this faster 53 53 1 1 4760 1750 2.720 66 66 2 1 4800 1960 2.449 82 82 1 3 5370 2390 2.247 102 102 2 3 6630 2560 2.590 127 127 3 2 8150 3310 2.462 158 158 2 2 8160 3960 2.061 197 197 2 2 9530 4760 2.002 246 246 4 2 13430 6150 2.184 307 307 4 2 15180 7300 2.079 383 383 4 3 18120 9180 1.974 478 478 4 3 23040 11340 2.032 597 597 5 3 27760 14820 1.873 746 746 6 4 30900 18900 1.635 932 932 5 4 39800 25100 1.586 1165 1165 5 5 49400 34600 1.428 1456 1456 5 5 59500 46000 1.293 1820 1820 5 7 79500 65000 1.223 2275 2275 6 7 108000 92000 1.174 2843 2843 7 9 133000 131500 1.011 3553 3552 6 11 189500 188000 1.008 4441 4441 7 11 253500 278500 0.910 5551 5551 7 12 350500 409000 0.857

Logarithm using a 256KB (2^9 values x 4096 bits) precomputed lookup table buys another 2x speedup at low precision:

fredrik@airy:~/src/fastfunlib/fastfunlib/logtest2$ ./logtest2 prec acc J r mpfr this faster 53 53 1 0 4710 900 5.233 66 66 1 0 4770 930 5.129 82 82 1 0 5700 1040 5.481 102 102 1 0 6660 1200 5.550 127 127 1 0 7800 1430 5.455 158 158 2 0 8660 1750 4.949 197 197 1 0 9280 2170 4.276 246 246 2 0 13480 2670 5.049 307 307 2 0 15020 3120 4.814 383 383 3 0 18100 4080 4.436 478 478 3 0 23640 5120 4.617 597 597 3 0 28620 6880 4.160 746 746 4 0 31000 9400 3.298 932 932 4 0 39800 12200 3.262 1165 1165 4 0 50200 17600 2.852 1456 1456 4 0 59000 25500 2.314 1820 1820 4 0 80000 38500 2.078 2275 2275 6 0 109000 56500 1.929 2843 2843 6 0 134500 87500 1.537 3553 3553 6 0 191000 133000 1.436 4441 4441 7 11 254500 278500 0.914 5551 5550 7 12 352000 410500 0.857

Gamma function

The gammatest program implements the gamma function using Taylor series, designed for small arguments, e.g. x < 100 (for very large x, a second implementation based on Stirling's series should be used). This approach pays off greatly, giving 20-100x speedups over MPFR.

Times are for computing gamma(5.7) (Columns: prec, accurate, MPFR time, this time, speedup.)

fredrik@airy:~/src/fastfunlib/fastfunlib/gammatest$ ./gammatest 53 53 76350 3150 24.238 66 66 83210 3530 23.572 82 82 90030 3770 23.881 102 102 108540 4220 25.720 127 127 121530 5850 20.774 158 158 140270 6920 20.270 197 197 174500 9090 19.197 246 246 238250 12100 19.690 307 307 309100 15800 19.563 383 383 409400 19720 20.761 478 478 604360 25260 23.926 597 597 847160 33280 25.456 746 746 1292500 48000 26.927 932 932 1872700 66200 28.289 1165 1165 3168700 111200 28.496 1456 1456 5332000 171000 31.181 1820 1820 9475000 288000 32.899 2275 2275 16993000 474000 35.850 2843 2843 30980500 809000 38.295 3553 3553 56078000 1362000 41.173 4441 4441 109397000 2362500 46.306 5551 5551 219725000 3975500 55.270 6938 6938 416183500 6976500 59.655 8672 8672 835161500 11983500 69.693 10840 10840 1652142000 20552500 80.386 13550 13550 3450440500 34704000 99.425

Generating the coefficients for 1000-digit precision from scratch should take about a second (a small fraction of a second for lower precisions). Currently, the test code must read pre-generated coefficients from a file. The svn repository includes a data file generated with mpmath that works up to about 15000 bits.

There should be an interface for loading and saving coefficient data to files between sessions. The fastfunlib project could also provide a downloadable repository of expensive data, so that not every user who needs many extremely high-precision function evaluations has to generate the same data from scratch (which could take hours).

Other functions

Initially exp, cosh/sinh, cos/sin, log, atan, zeta constants/Bernoulli numbers, gamma and digamma functions should be added; more functions may be added later on.

Project Information

Labels:
math