My favorites | Sign in
Project Home Downloads Wiki Source
Project Information
Members
Featured
Downloads
Links

About

primesieve is a free software program and C++ library that generates prime numbers and prime k-tuplets (twin primes, prime triplets, ...) < 2^64 using a highly optimized implementation of the sieve of Eratosthenes.

Screenshot

primesieve 1.0 running on Windows 7

The screenshot shows the prime numbers generated within the interval [1E16, 1E16+500].

Algorithm and Complexity

primesieve uses the segmented sieve of Eratosthenes with wheel factorization, this algorithm has a complexity of O(n log log n) operations and uses O(sqrt(n)) space.

Segmentation is currently the best known practical improvement to the sieve of Eratosthenes. Instead of sieving the interval [2, n] at once one subdivides the sieve interval into a number of equal sized segments that are then sieved consecutively. Segmentation drops the memory requirement of the sieve of Eratosthenes from O(n) to O(sqrt(n)). The segment size is usually chosen to fit into the CPU's fast L1 or L2 cache memory which significantly speeds up sieving. A segmented version of the sieve of Eratosthenes was first published by Singleton in 1969 [1], Bays and Hudson in [3] describe the algorithm in more detail.

Wheel factorization is used to skip multiples of small primes. If a kth wheel is added to the sieve of Eratosthenes then only those multiples are crossed off that are coprime to the first k primes, i.e. multiples that are divisible by any of the first k primes are skipped. The 1st wheel considers only odd numbers, the 2nd wheel (modulo 6) skips multiples of 2 and 3, the 3rd wheel (modulo 30) skips multiples of 2, 3, 5 and so on. Pritchard has shown in [4] that the running time of the sieve of Eratosthenes can be reduced by a factor of log log n if the wheel size is sqrt(n) but for cache reasons the sieve of Eratosthenes usually performs best with a modulo 30 or 210 wheel. Sorenson explains wheels in [5].

Additionally primesieve uses Tomás Oliveira e Silva's cache-friendly list algorithm if needed. This algorithm is relatively new it has been devised by Tomás Oliveira e Silva in 2001 in order to speed up the segmented sieve of Eratosthenes for prime numbers past 32 bits. The idea is to store the sieving primes into lists of buckets with each list being associated with a segment. A list of sieving primes related to a specific segment contains only those primes that have multiple occurrence(s) in that segment. Whilst sieving a segment only the multiples of primes within the related list need to be crossed off and each prime is reassigned to the list responsible for its next multiple when processed. The benefit of this approach is that it is now possible to use segments (i.e. sieve arrays) smaller than sqrt(n) without deteriorating efficiency, this is important as only small segments that fit into the CPU's L1 or L2 cache provide fast memory access.

Implementation

primesieve is written in portable C++, its speed is mainly due to the segmentation of the sieve of Eratosthenes which prevents cache misses when crossing off multiples in the sieve array and the use of a bit array instead of the more widely used byte (boolean) array. These are the optimizations I use in my implementation:

  • Uses a bit array with 30 numbers per byte for sieving
  • Pre-sieves multiples of small primes ≤ 19
  • Starts crossing off multiples at the square
  • Uses a modolo 210 wheel that skips multiples of 2, 3, 5 and 7
  • Uses specialised algorithms for small, medium and big sieving primes
  • Uses Tomás Oliveira e Silva's cache-friendly list algorithm for big sieving primes [10]
  • Parallelized using OpenMP

Timing Results

Limit
Prime Count
Intel Core i7-920
4 x (2.66GHz, 32K L1 Data Cache)
AMD Phenom II X6 1055T
6 x (3.90GHz, 64K L1 Data Cache)
107
664,579
0.00 s
0.00 s
108
5,761,455
0.02 s
0.01 s
109
50,847,534
0.09 s
0.03 s
232
203,280,221
0.35 s
0.13 s
1010
455,052,511
0.84 s
0.32 s
1011
4,118,054,813
10.26 s
4.10 s
1012
37,607,912,018
126.98 s
55.96 s
1013
346,065,536,839
1545.59 s
734.78 s

The above timings are for simple prime counting. The sieve size was chosen to match the CPU's L1 data cache size (Intel: 32K, AMD: 64K), the Intel Core-i7 used 8 threads (HTT) and the AMD Phenom II used 6 threads. Tests were run on Linux Mint 64-bit and Windows XP 64-bit.

primesieve 2.0 cpu scaling

The above "CPU scaling" benchmark was run on a Cluster Compute instance of Amazon's Elastic Compute Cloud (EC2) web service. The system consisted of a 2x Intel Xeon X5570, quad-core “Nehalem” architecture with 23 GB of memory running a 64-bit CentOS Linux operating system. At each start offset an interval of size 10^11 was sieved using different thread settings.

Source Code

primesieve has been written to be reusable in other projects, the following code snippet shows how to use the primesieve C++ library for prime number generation.

#include <primesieve/soe/PrimeSieve.h>
#include <iostream>

unsigned int sum = 0;

// called back for each prime up to 1000
void sumPrimes(unsigned int prime)
{
  sum += prime;
}

int main()
{
  PrimeSieve ps;
  ps.generatePrimes(2, 1000, sumPrimes);
  std::cout << "Sum of the primes below 1000 = " << sum << std::endl;
  return 0;
}

Here are some more usage examples. To browse the latest primesieve source code online visit the 'Source' tab.

References

  1. Richard C. Singleton, "An efficient prime number generator", Communications of the ACM 12, 563-564, 1969.
  2. R. P. Brent, "The first occurence of large gaps between successive primes", Mathematics of Computation, 27:959-963, 1973.
  3. C. Bays and R. Hudson, "The segmented sieve of Eratosthenes and primes in arithmetic progressions to 10^12", BIT 17:121 127, 1977.
  4. Paul Pritchard, "Fast compact prime number sieves (among others)", Journal of Algorithms 4 (1983), 332-344.
  5. J. Sorenson, "An analysis of two prime number sieves", Computer Science Technical Report Vol. 1028, June 1991.
  6. J. Sorenson, "Trading Time for Space in Prime Number Sieves", Lecture Notes in Computer Science, Vol. 1423 (1998), 179-194.
  7. J. Sorenson and I. Parberry, "Two Fast Parallel Prime Number Sieves", Information and Computation, Vol. 114, No. 1, pp. 115-130, 1994.
  8. Achim Flammenkamp, "The Art of Prime Sieving", 1998.
  9. Jörg Richstein, "Segmentierung und Optimierung von Algorithmen zu Problemen aus der Zahlentheorie", Gießen, Univ., Diss., 1999.
  10. Tomás Oliveira e Silva, "Fast implementation of the segmented sieve of Eratosthenes", 2002.
Powered by Google Project Hosting