1

So, I have 3 dimentional list. For example:

A=[[[1,2,3],[4,5,6],[7,8,9]],...,[[2,4,1],[1,4,6],[1,2,4]]]

I want to process each 2 dimentional list in A independently but they all have same process. If I do it in sequence, I do:

for i in range(len(A)):
    A[i]=process(A[i])

But, it takes very long time. Could you tell me how to parallel compute by data parallelization in Python?

7
  • Parallelization requires threading. That requires a lot of work. Since lists are mutable, you'll likely have to create a copy for each individual thread (if I'm smart at programming). Copying/slicing the list will likely take more time than processing it in a single thread. Commented Oct 27, 2016 at 1:43
  • List of options here... wiki.python.org/moin/ParallelProcessing Commented Oct 27, 2016 at 1:45
  • @Zizouz212 so, there's no way that more efficient that processing it sequentially? Commented Oct 27, 2016 at 1:56
  • Threading in Python is actually not parallel at all and would slow things down, because the Global Interpreter Lock only executes one Python instruction at a time regardless of the number of threads. It's only suitable for doing something while waiting for I/O on another thread. True multiprocessing would work, but sharing the data requires locking and would also quite likely slow things down if there's a lot of writing to the array. Commented Oct 27, 2016 at 1:56
  • @cricket_007 Thanks, I'll try it. Commented Oct 27, 2016 at 1:56

1 Answer 1

2

If you have multiple cores and processing each 2 dimensional list is expensive operation you could use Pool from multiprocessing. Here's a short example that squares numbers in different process:

import multiprocessing as mp

A = [[[1,2,3],[4,5,6],[7,8,9]],[[2,4,1],[1,4,6],[1,2,4]]]

def square(l):
    return [[x * x for x in sub] for sub in l]

pool = mp.Pool(processes=mp.cpu_count())
res = pool.map(square, A)

print res

Output:

[[[1, 4, 9], [16, 25, 36], [49, 64, 81]], [[4, 16, 1], [1, 16, 36], [1, 4, 16]]]

Pool.map will behave like built-in map while splitting the iterable to worker processes. It also has third parameter called chunksize that defines how big chunks are submitted to workers.

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you very much! I'll try it later because my program still running sequentially since 30 minutes ago. So, I don't have to make A into some part and python will part it?
@user7077941 No, you don't have to split A. You might want to define third parameter called chunksize though depending on your data.
Wow, such a simple way. Thank you. ^^
Hi, I'm confused. I tried using your code but somehow error appeared "AttributeError" in with mp.Pool(processes=mp.cpu_count()) as pool. What should I do?
@user7077941 Didn't notice python 2.7 tag so I was running it with 3.5, I've edited the answer to work on Python 2.7.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.