2

Say I have a pandas DataFrame as below:

   C30    C25    C20    C15    C10
1  AJA    EJE    IJI    OJO    UJU
2  AJA    EJE    IJI    OJO    UJU
3  AJA    EJE    IJI    OJO    UJU
4  ABA    EBE    IBI    OBO    UBU
5  ABA    EBE    IBI    OBO    UBU
6  ABA    EBE    IBI    OBO    UNU
7  BOB    BIB    BEB    BAB    BUB

I would like to validate all the rows that have equal values in C30 have also equal values in C25, C20, C15 & C10.

I have approximately 60,000 rows with a lot of variation in the column values, if elements in column C30 are equal then the corresponding column C25 elements should also be equal and so on.

For the example, validation should shows a mismatch because C10 value in row 6 is not equal to 4 & 5 rows. What's the most efficient way to do this?

3
  • Are you sure with answer? Commented Apr 9, 2021 at 13:24
  • yes, right answer is yours. I think I changed by mistake but I returned it to yours. Thanks for your help. Your proposal was exactly what I wanted. Commented Apr 9, 2021 at 13:35
  • Super, glad can help! Happy coding! Commented Apr 9, 2021 at 13:36

2 Answers 2

1

You can use DataFrameGroupBy.nunique and compare for not equal 1 by DataFrame.ne:

m = df.groupby('C30').nunique().ne(1)
print (m)
       C25    C20    C15    C10
C30                            
ABA  False  False  False   True
AJA  False  False  False  False
BOB  False  False  False  False

And then if need values if no matched use np.where:

i, c  = np.where(m)

L = list(zip(m.index[i], m.columns[c]))

if len(L) > 0:
    print ('mismatch')
    print (L)
mismatch
[('ABA', 'C10')]
    
Sign up to request clarification or add additional context in comments.

Comments

1

Use DataFrame.duplicated to find rows that are not unique.

df['is_dup'] = df.duplicated(keep=False)

Output

   C30  C25  C20  C15  C10  is_dup
1  AJA  EJE  IJI  OJO  UJU    True
2  AJA  EJE  IJI  OJO  UJU    True
3  AJA  EJE  IJI  OJO  UJU    True
4  ABA  EBE  IBI  OBO  UBU    True
5  ABA  EBE  IBI  OBO  UBU    True
6  ABA  EBE  IBI  OBO  UNU   False
7  BOB  BIB  BEB  BAB  BUB   False

2 Comments

I think answer is wrong, not tested only mismatch because C10 value in row 6 is not equal to 4 & 5 rows.
Isn't that what OP wanted @jezrael

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.