access index of dataframe row

Question

I want to drop the column id, and created two dataframes one of unique rows and the other containing the duplicated rows. My code is below, what i want is to add the column id to each dataframe(join).

d = {'id': ["i1", "i2", "i3", "i4", "i5"], 'x1': [13, 13, 61, 61, 61], 'x2': [10, 10, 13, 13, 13], 'x3': [12, 12, 2, 22, 2], 'x4': [24, 24, 9, 12, 9]}
df = pd.DataFrame(data=d)
del df['id']
dfduplicated = df[df.duplicated()]
dfUNIC= df.drop_duplicates(keep='first')

can you show what is the expected result?

Ji Wei
– Ji Wei

2020-03-17 13:48:31 +00:00
Commented Mar 17, 2020 at 13:48 — Ji Wei
– Ji Wei, Commented Mar 17, 2020 at 13:48

jezrael · Accepted Answer · 2020-03-17 13:48:57Z

3

Remove id by drop and test duplicates by DataFrame.duplicated, filter original data by boolean indexing:

m = df.drop('id', axis=1).duplicated()
dfduplicated = df[m]
print (dfduplicated)
   id  x1  x2  x3  x4
1  i2  13  10  12  24
4  i5  61  13   2   9

Then for inverse mask use ~:

dfUNIC= df[~m]
print (dfUNIC)
   id  x1  x2  x3  x4
0  i1  13  10  12  24
2  i3  61  13   2   9
3  i4  61  13  22  12

answered Mar 17, 2020 at 13:48

jezrael

867k102 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BENY · Accepted Answer · 2020-03-17 13:56:18Z

0

I will do cumcount

s=df.groupby(list(set(df)-{'id'})).cumcount()
df1=df[s==0].copy()
df2=df.drop(df1.index)
df1
Out[113]: 
   id  x1  x2  x3  x4
0  i1  13  10  12  24
2  i3  61  13   2   9
3  i4  61  13  22  12
df2
Out[114]: 
   id  x1  x2  x3  x4
1  i2  13  10  12  24
4  i5  61  13   2   9

answered Mar 17, 2020 at 13:56

BENY

324k22 gold badges176 silver badges250 bronze badges

Collectives™ on Stack Overflow

access index of dataframe row

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related