My favorites | Sign in
Project Logo
                
People details
Project owners:
  nickgsuperstar

A collection of high performance c-string transformations, frequently 2x faster than standard implementations (if they exist at all).

including

And it's all wrapped up in a bow for you to use:

News

03-Dec-2008:

07-Jan-2008: 3.7

20-Nov-2007: Version 3.6

19-Nov-2007: Version 3.5

27-Jun-2007: Version 3.4

06-Jun-2007: Version 3.3

24-May-2007: Version 3.2 has

22-Apr-2007: Version 3.1 released.

More News

If you use this library, be sure to join the group (listed in the right column) to be informed of all the latest goodies.

Why

Ok, so why? Besides providing a standard interface to common operations, they provide up to 3.5x performance improvement over standard sane versions, and up 50x improvement over numerous brain-dead versions floating around on the web. Your results may vary. It's highly processor dependent, but here's a sample on run an 2.2Ghz AMD system.

From the built-in performance test:

AMD 2.2Ghz, Linux 2.6.15, gcc -O3 (3.4.4)

Message size = 20
        modpb64 apache  improvement     modpb85 modpurl modp_b16
Encode    0.04    0.10    2.50x           0.09    0.05    0.04
Decode    0.04    0.13    3.25x           0.07    0.06    0.05

Message size = 200
        modpb64 apache  improvement     modpb85 modpurl modp_b16
Encode    0.43    0.99    2.30x           0.83    0.47    0.33
Decode    0.34    1.10    3.24x           0.60    0.46    0.41

Message size = 2000
        modpb64 apache  improvement     modpb85 modpurl modp_b16
Encode    4.02    9.45    2.35x           7.98    4.66    3.09
Decode    3.18   10.77    3.39x           5.94    4.54    3.93

Tested 16-May-2006

How It Works

The main trick with these functions is that they operate on 32-bits at a time and do crazy bit operations instead of regular algorithms that operate with 8-bits at a time. So for instance a standard implementation might do something like

A simple example is doing upper casing an ascii string. A standard implementation might look like

void toupper_copy1(char* dest, const char* str, int len)
{
    int i;
    for (i = 0; i < len; ++i) {
        // toupper is defined in <ctype.h>                                                                                              
        *dest++ = toupper(str[i]);
    }
    *dest = 0;
}

The problem here is some clibs have crap versions of toupper. They have to look up the 'C Locale' and use a complicated table and bit shifts. A better version might be

void modp_toupper_copy(char* dest, const char* str, int len)
{
    int i;
    uint32_t eax, ebx;
    const uint8_t* ustr = (const uint8_t*) str;
    const int leftover = len % 4;
    const int imax = len / 4;
    const uint32_t* s = (const uint32_t*) str;
    uint32_t* d = (uint32_t*) dest;
    for (i = 0; i != imax; ++i) {
        eax = s[i];
        /*
         * This is based on the algorithm by Paul Hsieh
         * http://www.azillionmonkeys.com/qed/asmexample.html
         */
        ebx = (0x7f7f7f7ful & eax) + 0x05050505ul;
        ebx = (0x7f7f7f7ful & ebx) + 0x1a1a1a1aul;
        ebx = ((ebx & ~eax) >> 2)  & 0x20202020ul;
        *d++ = eax - ebx;
    }

    i = imax*4;
    dest = (char*) d;
    switch (leftover) {
        case 3: *dest++ = gsToUpperMap[ustr[i++]];
        case 2: *dest++ = gsToUpperMap[ustr[i++]];
        case 1: *dest++ = gsToUpperMap[ustr[i]];
        case 0: *dest = '\0';
    }
}

Here:

For base 2, 16, 65 and 85 even more tricks are used, such as using precomputed shift tables. Take a look at modp_b64_gen to see how this works.

It is probable that a lot of these tricks could be used in Java. I don't program in Java that much any more, but go nuts.

That said....

Unless your application is doing almost nothing except encoding and decoding, the odds are that these functions will not improve the total performance of your application. Even with slow versions of base64 it's already so fast it's probably not measurable in your application.

Also this is highly compiler version and cpu model dependent. While it screams on AMD processors and is 'better' on Intel processors, it may not do as well on your platform. Please test (and send me the results from ./speedtest)









Hosted by Google Code