0

I have the below dataframe:

Col1 Col2 Col3 Col4 Col5  Col6  Col7  
1     A    T     1    AG   NBL   NH
2     A    T     1    NAG  BL    NH
3     A    M     2    NAG  NBL   HL 
4     NS   M     1    NAG  BL    NH
5     NS   T     1    NAG  NBL   HL 
6     A    M     2    NAG  NBL   HL

I want to create a new column based on this conditions:

IF Col5 = 'AG' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'A1'

IF Col5 = 'AG' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'A2'

IF Col6 = 'BL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'B1'

IF Col6 = 'BL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'B2'

IF Col7 = 'HL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 1 Then *(In the NewColumn)*  'H1'

IF Col6 = 'HL' AND Col2 = 'A' AND Col3 = 'M' AND Col4 = 0 Then *(In the NewColumn)*  'H2'

I have many different conditions in the original df but I'm trying to create a way just to add this conditions. I'm working with phyton in jupyter

1
  • Have you tried to do a df.apply() based on your conditions ? Commented Nov 9, 2020 at 20:20

2 Answers 2

1

Since the logic here is pretty complicated, I would suggest putting your conditions inside a function, and using DataFrame.apply() to call that function for every row in your dataset. Here's how I would translate your example into Pandas:

import pandas as pd

df = pd.read_csv("test.csv")


def classify(row):
    Col2 = row["Col2"]
    Col3 = row["Col3"]
    Col4 = row["Col4"]
    Col5 = row["Col5"]
    Col6 = row["Col6"]
    Col7 = row["Col7"]
    if Col5 == 'AG' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'A1'

    if Col5 == 'AG' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'A2'

    if Col6 == 'BL' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'B1'

    if Col6 == 'BL' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'B2'

    if Col7 == 'HL' and Col2 == 'A' and Col3 == 'M' and Col4 == 1:
        return 'H1'

    if Col6 == 'HL' and Col2 == 'A' and Col3 == 'M' and Col4 == 0:
        return 'H2'

    # No match
    return None


df["NewCol"] = df.apply(classify, axis=1)
print(df)

Note: I tried this function on your dataset, and I get the following result, which is probably not what you want:

   Col1 Col2 Col3  Col4 Col5 Col6 Col7 NewCol
0     1    A    T     1   AG  NBL   NH   None
1     2    A    T     1  NAG   BL   NH   None
2     3    A    M     2  NAG  NBL   HL   None
3     4   NS    M     1  NAG   BL   NH   None
4     5   NS    T     1  NAG  NBL   HL   None
5     6    A    M     2  NAG  NBL   HL   None

Inspecting the individual rows, they seem to be correct - none of them follow any of your rules. I'd suggest double checking that your rules are correct.

Sign up to request clarification or add additional context in comments.

Comments

0

I find numpy.select() to be a good fit for this problem where you have lots of conditions and need to map it a value.

import numpy as np
import pandas as pd

def my_transform(row: pd.Series):
    choices = [all([row['Col5'].strip() == 'AG', row['Col4'] == 1]), 
               all([row['Col5'].strip() == 'AG', row['Col4'] == 0]), True]
    results = ['A1', 'A2', 'Default']
    return np.select(choices, results)

df['my_new_col'] = df.apply(my_transform, axis=1)

What select does is it checks for the index of the first truthy value and returns the value corresponding to that index from the second parameter passed to it (here results). Here I have simplified your first two rules but you could expand on them to fit your exact situation.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.