Python Multiprocessing with a single function

Question

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.

It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.

Here is the essence of the code I have:

STD_1 = []
STD_2 = []
# etc.

for L in range(0,6,2):
    for a in range(1,100):
        ### simulation code ###
        STD_1.append(value_1)
        STD_2.append(value_2)
        # etc.

Here is what I can modify it to:

master_list = []

def simulate(a,L):
    ### simulation code ###
    return (a,L,STD_1, STD_2 etc.)

for L in range(0,6,2):
    for a in range(1,100): 
        master_list.append(simulate(a,L))

Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.

How exactly would I go about coding this?

EDIT: Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?

EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.

import multiprocessing

data = []

for L in range(0,6,2):
    for a in range(1,100):
        data.append((L,a))

print (data)

def simulation(arg):
    # unpack the tuple
    a = arg[1]
    L = arg[0]
    STD_1 = a**2
    STD_2 = a**3
    STD_3 = a**4
    # simulation code #
    return((STD_1,STD_2,STD_3))

print("1")

p = multiprocessing.Pool()

print ("2")

results = p.map(simulation, data)

EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?

The way I usually do such a thing is to use Pool.apply_async right there in the loop where you're creating all the possible arguments. Then at the end of the loop, you put p.close(); p.join() and then wait ;-) — Henry Keiter
– Henry Keiter, Commented Feb 22, 2014 at 20:41
This is the solution you are looking for: enter link description here — Coddy
– Coddy, Commented Mar 1, 2021 at 13:36

Roland Smith · Accepted Answer · 2014-02-22 20:45:51Z

3

Wrap the data for each iteration up into a tuple.
Make a list data of those tuples
Write a function f to process one tuple and return one result
Create p = multiprocessing.Pool() object.
Call results = p.map(f, data)

This will run as many instances of f as your machine has cores in separate processes.

Edit1: Example:

from multiprocessing import Pool

data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]

def f(t):
    name, a, b, c = t
    return (name, a + b + c)

p = Pool()
results = p.map(f, data)
print results

Edit2:

Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.

edited Feb 22, 2014 at 20:45

answered Feb 22, 2014 at 19:40

Roland Smith

43.7k3 gold badges69 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Austin Wismer Over a year ago

Thanks for your response. Could you put your suggestions into a code block that will run so that it is a little clearer to me? I'm still pretty new to Python, and having a simple example that pertains exactly to my situation would be quite helpful!

Austin Wismer Over a year ago

On my computer, this code hangs indefinitely. I am running Python 3.3.3 on OS X.

Roland Smith Over a year ago

@user264087: There are some improvements to multiprocessing for OSX in progress, but they are scheduled for 3.4. Try running the code in a debugger to see where it hangs...

Austin Wismer Over a year ago

I just stuck print statements in to see which command was the problem. It is "p = Pool()"

Roland Smith Over a year ago

@user264087 It seems the detection of the number of CPUs on OSX is somewhat broken. Try initializing the Pool with the actual number of CPUs, e.g. Pool(4).

|

barak manos · Accepted Answer · 2014-02-22 20:30:35Z

0

Here is one way to run it in parallel threads:

import threading

L_a = []

for L in range(0,6,2):
    for a in range(1,100):
        L_a.append((L,a))
        # Add the rest of your objects here

def RunParallelThreads():
    # Create an index list
    indexes = range(0,len(L_a))
    # Create the output list
    output = [None for i in indexes]
    # Create all the parallel threads
    threads = [threading.Thread(target=simulate,args=(output,i)) for i in indexes]
    # Start all the parallel threads
    for thread in threads: thread.start()
    # Wait for all the parallel threads to complete
    for thread in threads: thread.join()
    # Return the output list
    return output

def simulate(list,index):
    (L,a) = L_a[index]
    list[index] = (a,L) # Add the rest of your objects here

master_list = RunParallelThreads()

edited Feb 22, 2014 at 20:30

answered Feb 22, 2014 at 20:09

barak manos

30.2k10 gold badges67 silver badges117 bronze badges

4 Comments

Austin Wismer Over a year ago

In the reading I've done in the past couple days, people have indicated that threading is inferior to multiprocessing. I read somewhere that it actually takes longer with threading than with a serial process. Is this true?

barak manos Over a year ago

Not sure. I do recall a friend telling me that multi-threading in Python is simulated by the Python interpreter rather than being executed as "real" parallel threads by the OS.

barak manos Over a year ago

In any case, the code above runs. So you can simply add the rest of your objects, and compare the performance with that of your sequential code.

Roland Smith Over a year ago

In CPython, due to the Global Interpreter Lock ("GIL"), only one thread at a time can be executing Python bytecode. This limits the usefulness of threads for computing-intensive tasks in CPython.

Coddy · Accepted Answer · 2021-03-01 13:37:57Z

0

Use Pool().imap_unordered if ordering is not important. It will return results in a non-blocking fashion.

answered Mar 1, 2021 at 13:37

Coddy

5745 silver badges23 bronze badges

Collectives™ on Stack Overflow

Python Multiprocessing with a single function

3 Answers 3

12 Comments

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

12 Comments

4 Comments

Comments

Linked

Related