Looking at your code, one major performance bottleneck will be the repeated use of !! to search a list from the beginning in linear time. Another is the repeated use of mod, which is the slowest integer operation on modern CPUs if it’s in the ISA at all. You also use unfoldr, which can be slower than scanl' or a tail-recursive-modulo-cons function that strictly evaluates its arguments (bang patterns or seq). The !! operator also works in linear rather than constant time, but since you only invoke it once (thanks, benrg), this would not be a large issue.
Intuitively, you’ll probably speed up this algorithm the most by using a mutable Data.Vector instead of a list. However, that’s not really idiomatic Haskell, so you might prefer lists for that reason.
To speed up this sieve using lists, here is one approach you might try: It’s more of a true sieve of Eratosthenes that finds all primes within a given range, using Int rather than Integer. It’s therefore less flexible than what you wrote.
- Define a tail-recursive helper function
sieve :: [Int] -> [Int]. - Call
sieve [2..n]. You could also optimize by callingsievewith only the odd numbers between 3 andn, and prepending 2 to the result. - If the argument to
sieveis[], there are no more numbers in range to sieve. Return[]. - Within each call to
sieve n (p:potentials),nis the limit of the range,pis definitely prime andpotentialsis a list of potential primes in range. Splitting the second argument into a head and tail is a constant-time operation.- If
p*p > n, you have already found all composites in range. All remaining numbers you were given are prime. Return(p:potentials). - Otherwise:
- Generate a list of all multiples of
pin range. Do this with a well-behaved list comprehension that uses only addition (much faster thanmod) and avoids creating the list as a data structure in memory. - Take the set difference of
potentialsand the multiples, optimizing based on the fact that both are ordered lists with no duplicates. This function should be either tail-recursive-modulo-cons, or tail-recursive with an accumulator. - Return
p:(sieve setDifference). This makes your function tail-recursive-modulo-cons.
- Generate a list of all multiples of
- If
You might convert this algorithm (and many other tail-recursive-modulo-cons algorithms) into a Data.List.scanl'. This will give you code equivalent to what an imperative language would compile a strictly-evaluated while loop that generates list from head to tail into. This is typically much faster than unfoldr.