I need to create a function in pandas that takes a single dataframe as input and returns multiple dataframes as output based on a specific condition. (please check the example below for condition). I am having a hard time to figure out how. I need some experts' advice on coding.
Example 1:
Input = dataframe with 100 columns
Outputs = dataframe1 with first 10% columns (columns 1 to 10), dataframe2 with second 10% columns (columns 11 to 20) and so on upto the last 10% Columns (columns 91 to 100).
Example 2:
Input = dataframe with 109 columns
Outputs = dataframe1 with first 10% of columns (rounded off) (columns 1 to 11), dataframe2 with second 10% columns (columns 12 to 23) and so on upto the last 10% columns (columns 109)
This is the logic I am trying to develop:
- find the 10% value from the total number the columns in the original dataframe as 'n'
- pick the first 'n' columns from the original dataframe.
- add them to a new dataframe
- drop them from the original dataframe
- check whether the total number of columns in the original dataframe is greater than 'n'
- if NO -> repeat step 2 to step 5.
- if YES -> add all the remaining columns to the last created dataframe.
I tried the following code but it is wrong. In the following code I am trying to get the respected column numbers based on the percentage split and later I am planning to use those numbers to split the dataframe using iloc function.
def split_column_numbers(total_columns, percentage_split):
list1 = []
number = round((total_columns * (percentage_split/100)))
list1.append([0,number])
for i in range(number):
last_num = list1[-1][-1]
if (last_num < total_columns):
if((total_columns-last_num) > number):
list1.append([last_num+1, last_num+number])
else:
list1.append([last_num+1, total_columns])
return list1
split_column_numbers(101, 10)
Could anyone help me on whether this logic is correct and how to achieve this?