7
\$\begingroup\$

Is this an optimal solution? I doubt it is. Can someone please critique my code?

def better_prime_finder(lim):
    primes = []
    for i in range(2, lim + 1, 1):
        primes.append(i)
    for n in range(lim):
        p = primes[0]
        try: p = primes[n]
        except IndexError: pass
        for i in range(2, lim):
            if i*p not in primes:
                continue
            primes.remove(i*p)
    return primes
\$\endgroup\$
3
  • \$\begingroup\$ Can you find 500,000 to 1,000,000 primes in a second? If not, it iwn't optimal. \$\endgroup\$ Commented Mar 19 at 17:15
  • \$\begingroup\$ This is not how prime number generators work. Generally, they generate a number consisting of random bits, and use a fast probabilistic primality test algorithm to decide if the number is a prime or not with a high degree of likeliness and that's good enough for most uses. \$\endgroup\$ Commented Mar 20 at 14:58
  • 2
    \$\begingroup\$ @AykhanHagverdili This program does not generate a random prime. Instead, it generates the list of all the primes below the limit. In other words, it's a sieve of Eratosthenes. \$\endgroup\$ Commented Mar 20 at 16:35

5 Answers 5

11
\$\begingroup\$

In addition to toolic's answer:

List comprehension

    primes = []
    for i in range(2, lim + 1, 1):
        primes.append(i)

could be written simpler (and faster) as:

    primes = [i for i in range(2, lim + 1)]

Early termination

Consider this loop:

        for i in range(2, lim):
            if i*p not in primes:
                continue
            primes.remove(i*p)

You attempting to eliminating lim multiples of p from a list which can only ever contain numbers up to lim. For instance, with lim = 10, you'll eventually try to remove 2*7, 3*7, 4*7, 5*7, 6*7, 7*7, 8*7, and 9*7. All of those values exceed 10, so none of them could ever be in the list!

Instead, you could stop when i*p exceeds lim. Instead of adding in an extra condition to test this, let's change the loop to stop at that point:

        for ip in range(2*p, lim + 1, p):
            if ip in primes:
                primes.remove(ip)

Now the loop runs from 2*p, and increments by p each time, stopping when it exceeds lim.

Early termination (part 2)

    for n in range(lim):
        p = primes[0]
        try:
            p = primes[n]
        except IndexError:
            pass
        for ip in range(2*p, lim + 1, p):
            if ip in primes:
                primes.remove(ip)

What is happening when you get an IndexError?

It means n is beyond the end of the primes[] list. What will happen in subsequent loops? n will be larger, so same thing. We can abandon further loopping at this point.

    try:
        for n in range(lim):
            p = primes[n]
            for ip in range(2*p, lim + 1, p):
                if ip in primes:
                    primes.remove(ip)
    except IndexError:
        pass

Alternately, you can change the to a while n < len(primes):, and increment n yourself.

Algorithmic Improvements

You don't need to check all multiples of p. Instead of starting at 2*p, you could start at p*p, since you'll have already removed the smaller multiples.

See also: the Sieve of Eratosthenes. You'll find plenty of implementations of it here on Code Review.

\$\endgroup\$
10
  • 3
    \$\begingroup\$ @FirstTimer: There's no shortcut, you need to think about it. If it's a well-known problem, you can find tricks on the Internet, but if it's a "custom" problem, you're on your own, so it's best to develop the habit of thinking. When thinking is hard, instrumenting can help. For example, had you instrumented (print!) the code to check whether remove was removing anything, or on which index IndexError was occurring, you may have noticed patterns which would have helped you identify the opportunities. Still takes thinking -- what to instrument? is there a pattern? \$\endgroup\$ Commented Mar 19 at 9:25
  • 3
    \$\begingroup\$ I'd prefer to write list(range(2, lim + 1)) rather than [i for i in range(2, lim + 1)], but I don't know if there's a performance difference? \$\endgroup\$ Commented Mar 19 at 11:31
  • 1
    \$\begingroup\$ I might be wrong, but aren't try/except blocks pretty slow? If so, putting one in each loop seems like a very expensive thing to do. Instead, why not put the for loop inside a try block. \$\endgroup\$ Commented Mar 21 at 1:01
  • 1
    \$\begingroup\$ @tomasz Yes, which is why (among other reasons) that happened in "Early termination (part 2)" \$\endgroup\$ Commented Mar 21 at 14:42
  • 1
    \$\begingroup\$ @AJNeufeld Yeah but I don't think that's what tomasz was talking about. (And if I replace except IndexError: pass with except IndexError: break, then both drop to ~18 ms). \$\endgroup\$ Commented Mar 23 at 9:24
7
\$\begingroup\$

Other answers have provided valuable feedback. Another area for improvement outside of the algorithmic opportunities is to have your function return a generator rather than a list.

I'm going to make minimal changes other than this.

def better_prime_finder(lim):
    primes = list(range(2, lim + 1))
    for n in range(lim):
        try: 
            p = primes[n]
        except IndexError: 
            return
        yield p
        for i in range(2, lim):
            if i*p not in primes:
                continue
            primes.remove(i*p)

I have also had the function return when the IndexError is raised, since at this point we do not need to continually check primes[0]. No useful work is being done by this.

By making these few small changes, we open up a whole world of flexibility. If, for instance, I wanted to compute primes up to 1,000, but I only care about the first 20 primes that are greater than 10, I can do that using itertools and then consume that generator to create a list.

>>> list(
...   itertools.islice(
...     itertools.dropwhile(
...       lambda x: x <= 10, 
...       better_prime_finder(1000)
...     ), 
...     20
...   )
... )
[11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89]

The beauty of this, though, is that you don't necessarily need to create a list when consuming the generator. You might just directly print each prime without the extra step of creating the list.

Now, because of the algorithmic issues, this still runs very slowly if we specify a limit of something like a million, but once it yields 89 it's not doing any further work.

We can apply this same generator approach even if we implement something dramatically more efficient than your current approach like the Sieve of Eratosthenes.

\$\endgroup\$
7
\$\begingroup\$

list.remove() can be a relatively expensive operation. It's much cheaper to set a list element's value, so most prime sieves work by creating an array of length lim (or a more compact "bitset") and initialising all its values to True. Then instead of removing composite numbers, we simply mark the corresponding entry as False.

At the end, we just return the numbers whose entry is still True - we could do that using a comprehension, or perhaps functionally using enumerate(), filter() and map().

Arguably better: we could just yield each prime as we reach it.

\$\endgroup\$
3
  • 1
    \$\begingroup\$ A compact data-structure like a bitset also avoids building a list of all numbers from 2 to lim+1. (The OP's list includes all the even numbers; baking in the fact that 2 is the only even prime can easily halve the memory consumption of this or a bitset). Even just an array of bools would avoid allocating a separate Python object for each number (greater than 100 or whatever the limit is for CPython to pre-allocate them). Especially for numbers bigger than 2^30 which require multiple "limbs" (bigint chunks) in CPython. \$\endgroup\$ Commented Mar 19 at 17:09
  • 1
    \$\begingroup\$ Yes, this is the key point. After using all suggestions from the accepted answer, it still takes 35 seconds on my system for 100000, but a fraction of a second using an array of booleans. You might as well build the list of primes as you go along, though, rather than at the end - you need to do stuff when you encounter a True anyway, and .append() is cheap. \$\endgroup\$ Commented Mar 21 at 16:01
  • 1
    \$\begingroup\$ @EspeciallyLime, good point about building the list as we go - even better, we could just yield them, making it the caller's choice whether we need them all in memory at once. We know we'll process them in ascending order. \$\endgroup\$ Commented Mar 21 at 16:30
5
\$\begingroup\$

I don't have much to contribute in terms of how optimal the code is, but I do have some minor coding style suggestions. The code style is pretty good already, but...

Layout

These lines:

    try: p = primes[n]
    except IndexError: pass

are more customarily split onto separate lines:

    try:
        p = primes[n]
    except IndexError:
        pass

Documentation

The PEP 8 style guide recommends adding docstrings for functions. The docstring should summarize the purpose of the function as well as describe it input type and return type. For example:

def better_prime_finder(lim):
    """
    Find prime numbers less than the input parameter.
    The input (lim) should be an integer greater than 1.
    Returns list of prime numbers.
    """

Naming

The name of the function would be better without the word "better", ironically:

def prime_finder(lim):

Simpler

The default range step is 1, so this line:

for i in range(2, lim + 1, 1):

is simpler as:

for i in range(2, lim + 1):

It seems strange that you set p to one value, then you immediately set it to a potentially different value on the next line:

p = primes[0]
try: p = primes[n]

Perhaps this code is more straightforward:

for n in range(lim):
    try:
        p = primes[n]
    except IndexError:
        p = primes[0]
    for i in range(2, lim):

Algorithm

As far as your algorithm goes, there are plenty of other solutions here if you search primes and python tags

\$\endgroup\$
2
  • \$\begingroup\$ Thanks you! Why do I need the docstring in this case if the name already summarises what to do (at least I think so) \$\endgroup\$ Commented Mar 19 at 0:19
  • 2
    \$\begingroup\$ @FirstTimer: You're welcome. It is not a matter of need; these are just common suggestions. You decide for yourself what works best for you. \$\endgroup\$ Commented Mar 19 at 0:52
4
\$\begingroup\$

You said "below the limit", but for example better_prime_finder(7) returns [2, 3, 5, 7], which is wrong since 7 isn't below 7.

And no need to re-implemented list(...) and list iterators.

def better_prime_finder(lim):
    primes = list(range(2, lim))
    for p in primes:
        for i in range(2, lim):
            if i*p not in primes:
                continue
            primes.remove(i*p)
    return primes

Attempt This Online!

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.