1

I want to break dataframe into blocks from one True value to next True value:

data flag
MODS start 12/12/2020 True
Some data False
Some data False
MODS start 30/12/2020 True
Some data False
Some data False

To

data flag
MODS start 12/12/2020 True
Some data False
Some data False
data flag
MODS start 30/12/2020 True
Some data False
Some data False

Please help

2
  • What is the logic for splitting the rows? Or is it that you want to split every 3 rows in a new df? Commented Dec 10, 2020 at 18:05
  • 2
    Logic is..Slipt dataframe from one True value to next True value Commented Dec 10, 2020 at 18:38

3 Answers 3

3

You can use cumsum to create groups then query the datafame for each group:

df = pd.DataFrame({'data':['MODS start 12/12/202','Some data', 'Some data', 'MODS starts 30/12/2020', 'Some data', 'Some data'],
                  'flag':[True, False, False, True, False, False]})

df['grp'] = df['flag'].cumsum()

print(df)

Output:

                     data   flag  grp
0    MODS start 12/12/202   True    1
1               Some data  False    1
2               Some data  False    1
3  MODS starts 30/12/2020   True    2
4               Some data  False    2
5               Some data  False    2

The use:

df.query('grp == 1')

                   data   flag  grp
0  MODS start 12/12/202   True    1
1             Some data  False    1
2             Some data  False    1

and

df.query('grp == 2')

                     data   flag  grp
3  MODS starts 30/12/2020   True    2
4               Some data  False    2
5               Some data  False    2
Sign up to request clarification or add additional context in comments.

1 Comment

Nice! It also easily provides a variable in case OP wants to use groupby.
1

You can use numpy.split:

np.split(df, df.index[df.flag])[1:]

Here, I used [1:] because numpy.split also consider the groups before the first index, even if it's empty.


That said, you can also use a simple list comprehension:

idx = df.index[df.flag].tolist() + [df.shape[0]]
[df.iloc[idx[i]:idx[i+1]] for i in range(len(idx)-1)]

Output (both approaches):

                    data   flag
0  MODS start 12/12/2020   True
1              Some data  False
2              Some data  False 

                    data   flag
3  MODS start 30/12/2020   True
4              Some data  False
5              Some data  False 

2 Comments

One thing is nice about your 1st answer is it splits dynamically. We don't have to actually how many numbers of groups are there if we have 50M data and you don't have a loop. Thank you
Numpy is awesome! Glad to help.
0

Get a list of indices of rows with flag = True

true_idx = df[df['flag']==True].index
n = len(true_idx)

Loop over true_idx and create a list of dataframes from each true index to next

new_dfs_list = [df.iloc[ true_idx[i]:true_idx[i+1], :] for i in range(n-1)]

append last df from last true index to the tail of df

new_dfs_list.append(df.iloc[ true_idx[n-1]:, :])

access any of your new_dfs by index

print(new_dfs_list[-1])

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.