3

Is it possible to partition a pandas dataframe to do multiprocessing?

Specifically, my DataFrames are simply too big and take several minutes to run even one transformation on a single processor.

I know, I could do this in Spark but a lot of code has already been written, so preferably I would like to stick with what I have and get parallel functionality.

6
  • 1
    take a look at Dask project Commented May 27, 2016 at 20:15
  • 1
    What exactly are you trying to do? multiprocessing seem to work with pandas - stackoverflow.com/questions/26187759/… Commented May 27, 2016 at 20:20
  • 2
    Dask Examples Commented May 27, 2016 at 20:23
  • Hey Torrinos, it seems like the answers were specific to applying on a groupby object. I have a bunch of apply statements over rows on a whole dataframe. Instead of running the whole dataframe on a single processor, I would like to parallelize it over multiple processors. Commented May 29, 2016 at 2:50
  • Hey Max, dask seems promising, but is it in any way connected to pandas? If it's a child of pandas DataFrame then I can use it. Otherwise, it's too dangerous - it will probably blow up a large portion of my code. Commented May 29, 2016 at 13:47

1 Answer 1

4

Slightly modifying https://stackoverflow.com/a/29281494/5351271 I could get a solution to work over rows.

from multiprocessing import Pool, cpu_count

def applyParallel(dfGrouped, func):
    with Pool(cpu_count()) as p:
        ret_list = p.map(func, [group for name, group in dfGrouped])
    return pandas.concat(ret_list)

def apply_row_foo(input_df):
    return input_df.apply((row_foo), axis=1)

n_chunks = 10

grouped = df.groupby(df.index // n_chunks)
applyParallel(grouped, apply_row_foo)

If the index is not merely a row number, just group by np.arange(len(df)) // n_chunks

Decidedly not elegant, but worked in my use case.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.