|
Project Information
Members
Featured
Wiki pages
Links
|
A collection of high performance c-string transformations, frequently 2x faster than standard implementations (if they exist at all). including
And it's all wrapped up in a bow for you to use:
HEY
News30-Sep-2010: modp_numtoa.[c|h] changed to MIT
04-Apr-2010: 3.10.3 released
30-March-2010: On the interwebs! 19-Mar-2010: 3.10.0 release
12-Feb-2010: 3.9.0 release
27-Oct-2009: 3.8.0 released
03-Dec-2008: More News If you use this library, be sure to join the group (listed in the right column) to be informed of all the latest goodies. WhyOk, so why? Besides providing a standard interface to common operations, they provide up to 3.5x performance improvement over standard sane versions, and up 50x improvement over numerous brain-dead versions floating around on the web. Your results may vary. It's highly processor dependent, but here's a sample on run an 2.2Ghz AMD system. From the built-in performance test: AMD 2.2Ghz, Linux 2.6.15, gcc -O3 (3.4.4)
Message size = 20
modpb64 apache improvement modpb85 modpurl modp_b16
Encode 0.04 0.10 2.50x 0.09 0.05 0.04
Decode 0.04 0.13 3.25x 0.07 0.06 0.05
Message size = 200
modpb64 apache improvement modpb85 modpurl modp_b16
Encode 0.43 0.99 2.30x 0.83 0.47 0.33
Decode 0.34 1.10 3.24x 0.60 0.46 0.41
Message size = 2000
modpb64 apache improvement modpb85 modpurl modp_b16
Encode 4.02 9.45 2.35x 7.98 4.66 3.09
Decode 3.18 10.77 3.39x 5.94 4.54 3.93
Tested 16-May-2006How It WorksThe main trick with these functions is that they operate on 32-bits at a time and do crazy bit operations instead of regular algorithms that operate with 8-bits at a time. So for instance a standard implementation might do something like A simple example is doing upper casing an ascii string. A standard implementation might look like void toupper_copy1(char* dest, const char* str, int len)
{
int i;
for (i = 0; i < len; ++i) {
// toupper is defined in <ctype.h>
*dest++ = toupper(str[i]);
}
*dest = 0;
}The problem here is some clibs have crap versions of toupper. They have to look up the 'C Locale' and use a complicated table and bit shifts. A better version might be void modp_toupper_copy(char* dest, const char* str, int len)
{
int i;
uint32_t eax, ebx;
const uint8_t* ustr = (const uint8_t*) str;
const int leftover = len % 4;
const int imax = len / 4;
const uint32_t* s = (const uint32_t*) str;
uint32_t* d = (uint32_t*) dest;
for (i = 0; i != imax; ++i) {
eax = s[i];
/*
* This is based on the algorithm by Paul Hsieh
* http://www.azillionmonkeys.com/qed/asmexample.html
*/
ebx = (0x7f7f7f7ful & eax) + 0x05050505ul;
ebx = (0x7f7f7f7ful & ebx) + 0x1a1a1a1aul;
ebx = ((ebx & ~eax) >> 2) & 0x20202020ul;
*d++ = eax - ebx;
}
i = imax*4;
dest = (char*) d;
switch (leftover) {
case 3: *dest++ = gsToUpperMap[ustr[i++]];
case 2: *dest++ = gsToUpperMap[ustr[i++]];
case 1: *dest++ = gsToUpperMap[ustr[i]];
case 0: *dest = '\0';
}
}Here:
For base 2, 16, 65 and 85 even more tricks are used, such as using precomputed shift tables. Take a look at modp_b64_gen to see how this works. It is probable that a lot of these tricks could be used in Java. I don't program in Java that much any more, but go nuts. That said....Unless your application is doing almost nothing except encoding and decoding, the odds are that these functions will not improve the total performance of your application. Even with slow versions of base64 it's already so fast it's probably not measurable in your application. Also this is highly compiler version and cpu model dependent. While it screams on AMD processors and is 'better' on Intel processors, it may not do as well on your platform. Please test (and send me the results from ./speedtest) |