My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
NumToA  
Fast Number to String conversion
Featured
Updated Feb 4, 2010 by nickgsup...@gmail.com

modp_numtoa, fast number to string conversion

A common programming task is converting a binary number into ascii string (char buffer). modp_numtoa introduces safe and fast method of doing this.

The problem with sprintf

The standard way in "C" to convert a number to char buffer is to use sprintf (or snprintf).

Integer Types

A common use of sprintf for a integer is:

int i;
char buf[32];
sprintf(buf,"%d", i);

The problem here is when i isn't an int. if it changes, you'll be running to man page to learn that %hhd or "%lu" is the right format string. Except some types, like time_t are different sizes (either unsigned 32-bits or signed 64-bits), so you have to cast or use an ifdef.

The big problem is that using the wrong format string can cause a core dump. Some compilers do a better job than others in catching the errors.

Floating Point Types

Floating point types are even worse. For instance

double d = 123456.0123456789;
char buf[64];
sprintf(buf, "%f", i);

would seem reasonable. Until d is 1.1e200 and completely overruns your buffer. You can use snprintf but you have to be careful in checking the return code. And you can't totally limit the ouput size. Only the number of digits to the right of the decimal point, not to the left.

As an alternate, one can use the %g specifier which will switch between fixed and exponential format depending on the number of significant digits.

double d = 123456.0123456789;
char buf[64];
sprintf(buf, "%g", i);

This is better but this format doesn't appear to be widely used. Also the conversion to exponental format may occur in ways you don't want.

This biggest problem is that you don't want a one character change to possibly cause memory corruption.

modp_numtoa

The modp stringencoders library introduces:

  • modp_itoa -- convert int32 (or smaller signed integer) to char buffer
  • modp_uitoa -- convert uint32 (or smaller unsigned integer) to char buffer
  • modp_dtoa -- convert double to char buffer, with maximum size

modp_itoa and modp_uitoa are simple: pass in the value and a char buffer (16 bytes or larger).

modp_dtoa is a bit more complex. You also have to pass in precision, which corresponds the 'X' in "%.Xf" printf format specifier. However, if the number is greater than 2 billion, or "very small" the format will switch to exponential format. This insures the output will never be greater than 32 bytes (include ending null characters). For "regularly" sized values, the output is identical to printf.

performance

Yum!

$ ./speedtest_numtoa 
    type        sprintf snprf   numtoa  improvement
unsigned 8      200000  210000  10000   21.0x
unsigned 16     190000  190000  30000   6.3x
unsigned 32     160000  190000  30000   6.3x
signed 8        190000  210000  20000   10.5x
signed 16       180000  210000  30000   7.0x
signed 32       170000  190000  40000   4.8x
double          1670000 1590000 110000  14.5x

Your millage may vary.

Further Reading

This isn't the first or only way to solve the printf problem.

Comment by kornma...@gmail.com, Jan 23, 2012

According to Stuart: "Note: I am no longer working at Jodrell Bank so I have moved my page about itoa with GCC" (http://www.strudel.org.uk/itoa/)

Just FYI

Comment by remi.cha...@gmail.com, Mar 7, 2012

Might eb interested by this: http://itoa.sourceforge.net/

Comment by fasdfas...@gmail.com, Jul 21, 2013

callgrind says your code is 2x slower than modp_numtoa for 16-bit and 32-bit.

Comment by TaoKaYa...@gmail.com, Jul 28, 2013

agree... because it takes a big 64-bit division each time (which is a software emulation in x86) so it just optimized for 64 bit machines. for 32-bit machines need to change "long long" back to "long" to make it faster.

Comment by fasdfas...@gmail.com, Oct 29, 2013

strtk is faster than this.

Integer To String Test
[sprintf] Numbers:  60000000	Total:   434184031	Time:  8.2373sec	Rate:  7283910.0612nums/sec
[karma]   Numbers:  60000000	Total:   434184031	Time:  3.7606sec	Rate: 15954731.1096nums/sec
[strtk]   Numbers:  60000000	Total:   434184031	Time:  1.9449sec	Rate: 30849550.3421nums/sec
[so   ]   Numbers:  60000000	Total:   434184031	Time:  2.0932sec	Rate: 28664533.7069nums/sec
[timo ]   Numbers:  60000000	Total:   434184031	Time:  4.0893sec	Rate: 14672326.5982nums/sec
[voigt]   Numbers:  60000000	Total:   434184031	Time:  4.5193sec	Rate: 13276451.1161nums/sec
[hopman]  Numbers:  60000000	Total:   434184031	Time:  3.9061sec	Rate: 15360766.8095nums/sec
[modp u]  Numbers:  60000000	Total:   485627183	Time:  2.7258sec	Rate: 22012282.1197nums/sec
[modp]    Numbers:  60000000	Total:   434184031	Time:  2.8024sec	Rate: 21410403.1722nums/sec
Comment by partow, Nov 26, 2013

The following is a very comprehensive benchmark suite for Fast C++ Integer To String Conversion routines:

https://gist.github.com/anonymous/7670011

Comment by TaoKaYa...@gmail.com, Nov 26, 2013

what is format.h ?

Comment by TaoKaYa...@gmail.com, Nov 26, 2013

Here's the Optimized version for std::string

	CPU: Core i5 480M @ 2.67GHz

	Windows 7 32 bit, 
	Integer To String Test
	[jiaendu  ] Numbers: 200000000  Total:  1729691849      Time:  9.4816sec        Rate: 21093486.0985nums/sec
	[karma    ] Numbers: 200000000  Total:  1729691849      Time: 13.4273sec        Rate: 14895022.7457nums/sec
	[strtk    ] Numbers: 200000000  Total:  1718441849      Time:  8.0166sec        Rate: 24948211.9354nums/sec
	[so       ] Numbers: 200000000  Total:  1729691849      Time: 11.8468sec        Rate: 16882177.7103nums/sec
	[timo     ] Numbers: 200000000  Total:  1729691849      Time:  7.5537sec        Rate: 26477192.2468nums/sec
	[voigt    ] Numbers: 200000000  Total:  1729691849      Time: 12.7840sec        Rate: 15644500.4450nums/sec
	[hopman   ] Numbers: 200000000  Total:  1729691849      Time:  7.3518sec        Rate: 27204274.2580nums/sec
	[zverovich] Numbers: 200000000  Total:  1729691849      Time: 19.8899sec        Rate: 10055358.9469nums/sec
	Press any key to continue . . .

	Win7 64bit
	Integer To String Test
	[jiaendu  ] Numbers: 200000000  Total:  1729691849      Time:  5.8704sec        Rate: 34069173.7851nums/sec
	[karma    ] Numbers: 200000000  Total:  1729691849      Time: 11.9108sec        Rate: 16791465.2954nums/sec
	[strtk    ] Numbers: 200000000  Total:  1718441849      Time:  8.0503sec        Rate: 24843858.2724nums/sec
	[so       ] Numbers: 200000000  Total:  1729691849      Time:  9.9133sec        Rate: 20174967.4314nums/sec
	[timo     ] Numbers: 200000000  Total:  1729691849      Time:  7.1446sec        Rate: 27993234.7388nums/sec
	[voigt    ] Numbers: 200000000  Total:  1729691849      Time: 10.2293sec        Rate: 19551651.1035nums/sec
	[hopman   ] Numbers: 200000000  Total:  1729691849      Time:  6.1776sec        Rate: 32375184.5069nums/sec
	[zverovich] Numbers: 200000000  Total:  1729691849      Time: 12.6848sec        Rate: 15766919.0500nums/sec
	Press any key to continue . . .

	namespace jiaendu
	{
	#ifndef int32_t
	#define int32_t int
	#define uint32_t unsigned int
	#define int16_t short
	#define int64_t long long
	#define uint64_t unsigned long long
	#endif
	  inline int ufast_utoa10(unsigned int value, char* str)
	  {
	  #define JOIN(N) N "0", N "1", N "2", N "3", N "4", N "5", N "6", N "7", N "8", N "9"
	  #define JOIN2(N) JOIN(N "0"), JOIN(N "1"), JOIN(N "2"), JOIN(N "3"), JOIN(N "4"), \
					   JOIN(N "5"), JOIN(N "6"), JOIN(N "7"), JOIN(N "8"), JOIN(N "9")
	  #define JOIN3(N) JOIN2(N "0"), JOIN2(N "1"), JOIN2(N "2"), JOIN2(N "3"), JOIN2(N "4"), \
					   JOIN2(N "5"), JOIN2(N "6"), JOIN2(N "7"), JOIN2(N "8"), JOIN2(N "9")
	  #define JOIN4    JOIN3("0"), JOIN3("1"), JOIN3("2"), JOIN3("3"), JOIN3("4"), \
					   JOIN3("5"), JOIN3("6"), JOIN3("7"), JOIN3("8"), JOIN3("9")
	  #define JOIN5(N) JOIN(N), JOIN(N "1"), JOIN(N "2"), JOIN(N "3"), JOIN(N "4"), \
					   JOIN(N "5"), JOIN(N "6"), JOIN(N "7"), JOIN(N "8"), JOIN(N "9")
	  #define JOIN6    JOIN5(), JOIN2("1"), JOIN2("2"), JOIN2("3"), JOIN2("4"), \
					   JOIN2("5"), JOIN2("6"), JOIN2("7"), JOIN2("8"), JOIN2("9")
	  #define F(N)     ((N) >= 100 ? 3 : (N) >= 10 ? 2 : 1)
	  #define F10(N)   F(N),F(N+1),F(N+2),F(N+3),F(N+4),F(N+5),F(N+6),F(N+7),F(N+8),F(N+9)
	  #define F100(N)  F10(N),F10(N+10),F10(N+20),F10(N+30),F10(N+40),\
					   F10(N+50),F10(N+60),F10(N+70),F10(N+80),F10(N+90)
		static const short offsets[] = { F100(0), F100(100), F100(200), F100(300), F100(400),
										F100(500), F100(600), F100(700), F100(800), F100(900)};
		static const char table1[][4] = { JOIN("") }; 
		static const char table2[][4] = { JOIN2("") }; 
		static const char table3[][4] = { JOIN3("") };
		static const char table4[][8] = { JOIN4 }; 
		static const char table5[][4] = { JOIN6 };
	  #undef JOIN
	  #undef JOIN2
	  #undef JOIN3
	  #undef JOIN4
		char *wstr;
	#if (_WIN64 || __x86_64__ || __ppc64__)
		uint64_t remains[2];
	#else
		uint32_t remains[2];
	#endif
		unsigned int v2;
		if (value >= 100000000) {
	#if (_WIN64 || __x86_64__ || __ppc64__)
			remains[0] = (((uint64_t)value * (uint64_t)3518437209) >> 45);
			remains[1] = (((uint64_t)value * (uint64_t)2882303762) >> 58);
	#else
			remains[0] = value / 10000;
			remains[1] = value / 100000000;
	#endif
		  v2 = remains[1];
		  remains[1] = remains[0] - remains[1] * 10000;
		  remains[0] = value - remains[0] * 10000;
		  if (v2 >= 10) {
			*(int16_t *) str = *(int16_t *) table5[v2];
			str += 2;
			*(int32_t *) str = *(int32_t *) table4[remains[1]];
			str += 4;
			*(int32_t *) str = *(int32_t *) table4[remains[0]];
			return 10;
		  } else {
			*(char *) str = v2 + '0';
			str += 1;
			*(int32_t *) str = *(int32_t *) table4[remains[1]];
			str += 4;
			*(int32_t *) str = *(int32_t *) table4[remains[0]];
			return 9;
		  }
		}
		else if (value >= 10000) {
	#if (_WIN64 || __x86_64__ || __ppc64__)
		  v2 = (((uint64_t)value * (uint64_t)3518437209 ) >> 45);
	#else 
		  v2 = value / 10000;
	#endif
		  remains[0] = value - v2 * 10000;
		  if (v2 >= 1000) {
			*(int32_t *) str = *(int32_t *) table4[v2];
			str += 4;
			*(int32_t *) str = *(int32_t *) table4[remains[0]];
			return 8;
		  } else {
			wstr = str;
			*(int32_t *) wstr = *(int32_t *) table5[v2];
			wstr += offsets[v2];
			*(int32_t *) wstr = *(int32_t *) table4[remains[0]];
			wstr += 4;
			return (wstr - str);
		  }
		}
		else {
		  if (value >= 1000) {
			*(int32_t *) str = *(int32_t *) table4[value];
			return 4;
		  } else if (value >= 100) {
			*(int32_t *) str = *(int32_t *) table3[value];
			return 3;
		  } else if (value >= 10) {
			*(int16_t *) str = *(int16_t *) table2[value];
			return 2;
		  } else {
			*(char *) str = *(char *) table1[value];
			return 1;
		  }
		}
	  }

		  void ufast_itoa10(int value, std::string &s) {
			char buf[32];
			char *str = buf;
			int len;
			if (value < 0) { *(str++) = '-'; 
			  len= ufast_utoa10(-value, str) + 1; 
			}
			else len= ufast_utoa10(value, str);
			s = std::string(buf, len);
		  }
	}
Comment by TaoKaYa...@gmail.com, Nov 26, 2013

yet most of the time are spending on the constructor of std::string ....

Comment by partow, Nov 28, 2013

Update of the integer to string benchmark:

https://gist.github.com/anonymous/7700052


Sign in to add a comment
Powered by Google Project Hosting