0

I am trying to do some multiprocessing in Python. I have a function doing some work and returning a list. I want to repeat that for several cases. Finally, I want to get the returned list of each parallel call and unify them (have only one list with all duplicates removed).

def get_version_list(env):
    list = []
    #do some intensive work
    return list

from multiprocessing import Pool

pool = Pool()

result1 = pool.apply_async(get_version_list, ['prod'])
result2 = pool.apply_async(get_version_list, ['uat'])
#etc, I have six environment to check.

alist = result1.get()
blist = result2.get()

This does not work (not sure about the function call syntax, but I tried other things too, without success), it gives me an error (and repeats it a lot since my intensive work is doing around 300 request.post calls).

RuntimeError: Attempt to start a new process before the current process has finished its bootstrapping phase.

This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:

    if __name__ == '__main__':
        freeze_support()
        ...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
4
  • What part of the error message was unclear? Commented Oct 4, 2016 at 20:11
  • I guess I didn't read it correctly... Maybe it's time for me to go home. Commented Oct 4, 2016 at 20:17
  • 1
    Sorry if my comment sounded harsh. It's important to understand how multiprocessing works on Windows. Each process imports the main script—which also typically contains the main process. That's why dividing it up with if __name__ == '__main__': is important (so the main portion isn't re-executed every time that occurs). Commented Oct 4, 2016 at 20:30
  • I understand, I just just tired and didn't pay enough attention. It's a small test script with no structure where I want to test my code before integrating it in a web server (done with web.py). Commented Oct 4, 2016 at 21:36

1 Answer 1

1

You have to put the multiprocessing portion inside of your main function, like:

def get_version_list(env):
    list = []
    print "ENV: " + env
    return list


if __name__ == '__main__':
    from multiprocessing import Pool

    pool = Pool()

    result1 = pool.apply_async(get_version_list, ['prod'])
    result2 = pool.apply_async(get_version_list, ['uat'])
    #etc, I have six environment to check.

    alist = result1.get()
    blist = result2.get()
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.