I have the following code:
import pandas as pd
import random
a = [random.randint(0, 1) for i in range(30)]
b = [random.randint(0, 1) for i in range(30)]
print(a)
print(b)
df = pd.DataFrame([a, b])
df = df.T
columns = ['column1', 'column2']
df.columns = columns
print(df)
that creates a dataframe stored in variable 'df'. It consists of 2 columns (column1 and column2) filled with random 0s and 1s.
This is the output I got when I ran the program (If you try to run it you won't get exactly the same result because of the randomint generation).
column1 column2
0 0 1
1 1 0
2 0 1
3 1 1
4 0 1
5 1 1
6 0 1
7 1 1
8 1 0
9 0 1
10 0 0
11 1 1
12 1 1
13 0 1
14 0 0
15 0 1
16 1 1
17 1 1
18 0 1
19 1 0
20 0 0
21 1 0
22 0 1
23 1 0
24 1 1
25 0 0
26 1 1
27 1 0
28 0 1
29 1 0
I would like to create a filter on column2, showing only the clusters of data when there are three or more 1s in a row. The output would be something like this:
column1 column2
2 0 1
3 1 1
4 0 1
5 1 1
6 0 1
7 1 1
11 1 1
12 1 1
13 0 1
15 0 1
16 1 1
17 1 1
18 0 1
I have left a space between the clusters for visual clarity, but the real output would not have the empty spaces in the dataframe.
I would like to do it in the following way.
filter1 = (some boolean condition) &/| (maybe some other stuff)
final_df = df[filter1]
Thank you