Further down is a simple C++ implementation of the segmented sieve of Eratosthenes that generates the primes below n using operations and space. It runs significantly faster than a traditional sieve of Eratosthenes implementation due to its more efficient CPU cache usage i.e. it uses the CPU's L1 data cache size as its sieve array size. This ensures that less than 1 percent of the memory accesses will be cache misses. This implementation generates the primes below 10^9 in just 0.9 seconds (single-threaded) on an Intel Core i7-4770 3.4GHz CPU from 2013.
Suppose we want to sieve the primes below 10^9. We chose the size of the sieve array (named segment_size) to be the same size as the CPU's L1 data cache size per core e.g. 32768 bytes. We first generate the sieving primes below sqrt(10^9) which are needed to cross-off multiples. Then we start crossing-off the multiples of the first prime 2 until we reach a multiple of 2 >= segment_size, if this happens we calculate the index of that multiple in the next segment using (multiple - segment_size) and store it in the next array. We then cross-off the multiples of the next sieving primes using the same procedure. Once we have crossed-off the multiples of all sieving primes in the first segment we iterate over the sieve array and print out (or count) the primes.
In order to sieve the next segment we reset the sieve array and we increment the lower offset by segment_size. Then we start crossing-off multiples again, for each sieving prime we retrieve the sieve index from the next array and we start crossing-off multiples from there on.
Download segmented_sieve.cpp, open a terminal and run: