A collection of high performance c-string transformations, frequently 2x faster than standard implementations (if they exist at all).
including
- base64, standard
- base64, web/url safe, with configurable alphabet
- base85 (good for http cookies)
- base16 (hex)
- base2 (ascii binary)
- url escaping
- javascript string escaping
- fast number to string conversion, 4-22x faster than sprintf!
- fast ascii upper/lower case conversion, 2-66x faster than stdlib! (yes, 66x faster)
And it's all wrapped up in a bow for you to use:
- BSD License -- do what you want with it.
- Standard clean ANSI C, will also compile as C99 and C++
- C++ wrappers for std::string
- Standard install: configure && make && make install
- Extensive unit tests provided with >98% coverage.
- Endian safe.
- Performance test framework -- don't take my word, run it your self
- Lots of in-code documentation
HEY
- This code won't compile on windows, due to lack of stdint. Help appreciated.
- Also this code won't work on Sparc chips (Solaris) due to alignment issues. not hard to fix, but I don't have access to sparc machines
- Works great on Mac and Linux.
News
27-Oct-2009: 3.8.0 released
- Various fixes for compiler warnings using strict flags. Checked with gcc 4.4
- Adding various "num to alpha" 64-bit integer functions.
03-Dec-2008:
07-Jan-2008: 3.7
- Fix some C++ functions to propertly use "std::string" instead of "string"
- Fix autotool problem with ranlib on unix systems
- Tested under latest gcc 4.3 snapshot, no warnings or errors with -Wall -Werror
- No logic changes
20-Nov-2007: Version 3.6
- Fix more issues with round-to-even. Unit tests updated.
- modp_dtoa should produce identical output to printf on unix systems now (microsoft uses lame rounding rules)
19-Nov-2007: Version 3.5
- Fix 'precision 0' and 'round-to-even' rules in modp_dtoa (double to ascii)
27-Jun-2007: Version 3.4
- Fix CRITICAL bug in url decode. Core dump if %XY comes in where X or Y has the high bit set
- Add more C++ methods to prevent unnecessary string temporaries
06-Jun-2007: Version 3.3
- 2x faster toupper/tolower!
- fix static buffer overflow that only occurs during compile time (not runtime)
- Added more C++ methods for url_decode
24-May-2007: Version 3.2 has
- 30% faster base64 encoding on intel chips.
- modp_b2 -- binary string encode/code (e.g. 1001011011...)
- modp_ascii -- 2-24x faster toupper, tolower string encoding
- much improved C unit test framework
- C++ const methods added
22-Apr-2007: Version 3.1 released.
- Fixes some odd 64-bit compilation problems
- Adds modp_numtoa, fast number to string (char*) conversion.
More News
If you use this library, be sure to join the group (listed in the right column) to be informed of all the latest goodies.
Why
Ok, so why? Besides providing a standard interface to common operations, they provide up to 3.5x performance improvement over standard sane versions, and up 50x improvement over numerous brain-dead versions floating around on the web. Your results may vary. It's highly processor dependent, but here's a sample on run an 2.2Ghz AMD system.
From the built-in performance test:
AMD 2.2Ghz, Linux 2.6.15, gcc -O3 (3.4.4)
Message size = 20
modpb64 apache improvement modpb85 modpurl modp_b16
Encode 0.04 0.10 2.50x 0.09 0.05 0.04
Decode 0.04 0.13 3.25x 0.07 0.06 0.05
Message size = 200
modpb64 apache improvement modpb85 modpurl modp_b16
Encode 0.43 0.99 2.30x 0.83 0.47 0.33
Decode 0.34 1.10 3.24x 0.60 0.46 0.41
Message size = 2000
modpb64 apache improvement modpb85 modpurl modp_b16
Encode 4.02 9.45 2.35x 7.98 4.66 3.09
Decode 3.18 10.77 3.39x 5.94 4.54 3.93
Tested 16-May-2006How It Works
The main trick with these functions is that they operate on 32-bits at a time and do crazy bit operations instead of regular algorithms that operate with 8-bits at a time. So for instance a standard implementation might do something like
A simple example is doing upper casing an ascii string. A standard implementation might look like
void toupper_copy1(char* dest, const char* str, int len)
{
int i;
for (i = 0; i < len; ++i) {
// toupper is defined in <ctype.h>
*dest++ = toupper(str[i]);
}
*dest = 0;
}The problem here is some clibs have crap versions of toupper. They have to look up the 'C Locale' and use a complicated table and bit shifts. A better version might be
void modp_toupper_copy(char* dest, const char* str, int len)
{
int i;
uint32_t eax, ebx;
const uint8_t* ustr = (const uint8_t*) str;
const int leftover = len % 4;
const int imax = len / 4;
const uint32_t* s = (const uint32_t*) str;
uint32_t* d = (uint32_t*) dest;
for (i = 0; i != imax; ++i) {
eax = s[i];
/*
* This is based on the algorithm by Paul Hsieh
* http://www.azillionmonkeys.com/qed/asmexample.html
*/
ebx = (0x7f7f7f7ful & eax) + 0x05050505ul;
ebx = (0x7f7f7f7ful & ebx) + 0x1a1a1a1aul;
ebx = ((ebx & ~eax) >> 2) & 0x20202020ul;
*d++ = eax - ebx;
}
i = imax*4;
dest = (char*) d;
switch (leftover) {
case 3: *dest++ = gsToUpperMap[ustr[i++]];
case 2: *dest++ = gsToUpperMap[ustr[i++]];
case 1: *dest++ = gsToUpperMap[ustr[i]];
case 0: *dest = '\0';
}
}Here:
- the string is treated as a array of uint32_t not chars
- The integers are manipulated, effectively doing toupper 4 at a time
- oddballs are are computed using a table, not stdlib calls, or by doing math
For base 2, 16, 65 and 85 even more tricks are used, such as using precomputed shift tables. Take a look at modp_b64_gen to see how this works.
It is probable that a lot of these tricks could be used in Java. I don't program in Java that much any more, but go nuts.
That said....
Unless your application is doing almost nothing except encoding and decoding, the odds are that these functions will not improve the total performance of your application. Even with slow versions of base64 it's already so fast it's probably not measurable in your application.
Also this is highly compiler version and cpu model dependent. While it screams on AMD processors and is 'better' on Intel processors, it may not do as well on your platform. Please test (and send me the results from ./speedtest)