0

Im trying to add empty row after every unique Salary column value (Excpect duplicated values without empty row).

Current input :

    Name     Country  Department  Salary
0   John     USA      Finance     12000
1   John     Egypt    Finance     12000
2   Jack     France   Marketing   13000
3   Geroge   UK       Accounts    11000
4   Steven   India    Data        10000
5   Mohammed Jordan   IT          10000

Expected Output :

    Name     Country  Department  Salary
0   John     USA      Finance     12000
1   John     Egypt    Finance     12000

2   Jack     France   Marketing   13000

3   Geroge   UK       Accounts    11000

4   Steven   India    Data        10000
5   Mohammed Jordan   IT          10000

What i have tried :

import pandas as pd

df = pd.DataFrame({'Name': {0: 'John',1: 'John',2: 'Jack',
                            3: 'Geroge',4: 'Steven',5: 'Mohammed'},
                   'Country': {0: 'USA',1: 'Egypt',2: 'France',
                               3: 'UK',4: 'India',5: 'Jordan'},
                   'Department': {0: 'Finance',1: 'Finance',2: 'Marketing',
                                  3: 'Accounts',4: 'Data',5: 'IT'},
                   'Salary': {0: 12000, 1: 12000, 2: 13000, 
                              3: 11000, 4: 10000, 5: 10000}})

df.index = range(0, 2*len(df), 2)
df2 = df.reindex(index=range(2*len(df)))

What i got (Which is incorrect) :

    Name      Country   Department  Salary
0   John      USA       Finance     12000.0
1   NaN       NaN       NaN         NaN
2   John      Egypt     Finance     12000.0
3   NaN       NaN       NaN         NaN
4   Jack      France    Marketing   13000.0
5   NaN       NaN       NaN         NaN
6   Geroge    UK        Accounts    11000.0
7   NaN       NaN       NaN         NaN
8   Steven    India     Data        10000.0
9   NaN       NaN       NaN         NaN
10  Mohammed  Jordan    IT          10000.0
11  NaN       NaN       NaN         NaN

Would appreciate if someone could help me here.

2
  • Why index 0,1 together but not 3,4? Commented Aug 11, 2021 at 6:28
  • Because 3,4 index in salary column in not the same, to clarify I'm trying to add empty row between the rows where salary column is unique and keep duplicates in salary column together. Commented Aug 11, 2021 at 6:32

3 Answers 3

2

IIUC:

try appending empty dataframe by iterating over groupby():

Since I grouped by 'Department' but you can also groupby 'Salary' or aother column according to your need

l=[]
for x,y in df.groupby('Department',sort=False):
    l.append(y)
    l.append(pd.DataFrame([[float('NaN')]*len(y.columns)],columns=y.columns))

df=pd.concat(l,ignore_index=True).iloc[:-1]

output of df:

    Name        Country     Department  Salary
0   John        USA         Finance     12000.0
1   John        Egypt       Finance     12000.0
2   NaN         NaN         NaN         NaN
3   Jack        France      Marketing   13000.0
4   NaN         NaN         NaN         NaN
5   Geroge      UK          Accounts    11000.0
6   NaN         NaN         NaN         NaN
7   Steven      India       Data        10000.0
8   NaN         NaN         NaN         NaN
9   Mohammed    Jordan      IT          10000.0
Sign up to request clarification or add additional context in comments.

Comments

2

This should work for you

import pandas as pd

def f(x):
    return x.append(pd.DataFrame('', columns=df.columns, index=[('')]))


df = pd.DataFrame({'Name': {0: 'John',1: 'John',2: 'Jack',
                            3: 'Geroge',4: 'Steven',5: 'Mohammed'},
                   'Country': {0: 'USA',1: 'Egypt',2: 'France',
                               3: 'UK',4: 'India',5: 'Jordan'},
                   'Department': {0: 'Finance',1: 'Finance',2: 'Marketing',
                                  3: 'Accounts',4: 'Data',5: 'IT'},
                   'Salary': {0: 12000, 1: 12000, 2: 13000, 
                              3: 11000, 4: 10000, 5: 10000}})

df = df.groupby('Salary', sort=False, group_keys=False).apply(f)

print (df)

Output:

       Name Country Department Salary
0      John     USA    Finance  12000
1      John   Egypt    Finance  12000
                                     
2      Jack  France  Marketing  13000
                                     
3    Geroge      UK   Accounts  11000
                                     
4    Steven   India       Data  10000
5  Mohammed  Jordan         IT  10000

1 Comment

Note x.append(foo) has been replaced by pd.concat([x, foo]) in recent versions of pandas.
1

not sure if it is the most elegant way to do it. But here is what I will approach the problem.

pandas.DataFrame.groupby can help you group the dataframe of a specific column by their unique value. It will generate a generator that will have key and the sliced dataframe for each iteration. Using groupby can help you first slice the df according to salaries by unique value then perform some necessary operation.

output_df_list = []
for key, groupby_df in df.groupby('Salary'):
    groupby_df_with_nan_tail = groupby_df.append(pd.Series(None), ignore_index=True)
    output_df_list.append(groupby_df_with_nan_tail)
    
output_df = pd.concat(output_df_list)

1 Comment

This woreked as well :) , However it added extra empty line in bottom.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.