1

I have a DataFrame which I want to slice into many DataFrames by adding rows by one until the sum of column Score of the DataFrame is greater than 50,000. Once that condition is met, then I want a new slice to begin.

Here is an example of what this might look like:

1 Answer 1

0

Sum Score cumulatively, floor divide it by 50,000, and shift it up one cell (since you want each group to be > 50,000 and not < 50,000).

import pandas as pd
import numpy as np

# Generating DataFrame with random data
df = pd.DataFrame(np.random.randint(1,60000,15))

# Creating new column that's a cumulative sum with each
# value floor divided by 50000
df['groups'] = df[0].cumsum() // 50000

# Values shifted up one and missing values filled with the maximum value
# so that values at the bottom are included in the last DataFrame slice
df.groups = df.groups.shift(-1, fill_value=df.groups.max())

Then as per this answer you can use pandas.DataFrame.groupby in a list comprehension to return a list of split DataFrames.

df_list = [df_slice for _, df_slice in df.groupby(['groups'])]
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.