0

I have a dataframe with 3 variables (id, V1, V2), I would like to create another variable according to variables V1 and V2 which will be called V3 and which will take 2 modalities ("LISTED" and "UN_LISTED"). Here is my data frame:

print(df)
   id             V1       V2
0   1            NaN      NaN
1   2       QSINTSTK      NaN
2   3  1111GHGKJKIUH  122354H
3   4        FGHKLIH  123456K
4   5         FDUL12  237899M
5   6        VHKIOY3  784236A
6   7            NaN      Nan

Here is the condition to create V3:

If V1 and V2 is null then it is "UN_LISTED"

If V2 is not null then it is "LISTED"

If V1 is start with "QS" or "1111" then it is "UN_LISTED" otherwise it is "LISTED".

Here is my code :

def label_list (row):
if row['V1'] == np.NaN and row['V2'] == np.NaN:
    return 'UN_LISTED'

elif row['V2'] != np.NaN:
    return 'LISTED'

elif row['V1'] == "^QS" or row['V1'] == "^(1){4}" :
    return 'UN_LISTED'

else : 
    return "LISTED"

datatest.apply(lambda row : label_list(row), axis = 1)
datatest['V3'] = datatest.apply(lambda row : label_list(row), axis = 1)

But the result is wrong :

print(df)
   id             V1       V2     V3 
0   1            NaN      NaN  LISTED
1   2       QSINTSTK      NaN  LISTED
2   3  1111GHGKJKIUH  122354H  LISTED
3   4        FGHKLIH  123456K  LISTED
4   5         FDUL12  237899M  LISTED
5   6        VHKIOY3  784236A  LISTED
6   7            NaN      Nan  LISTED

Thanks for your help

0

1 Answer 1

1

A note about the following solution:

You have overlapping conditions in your requirements, for example, it is possible for V2 to not be null, but also for V1 to start with QS or 1111 (which occurs in row 3), so you will need to set up your np.select in the order that you want to prioritize those conditions.

Using np.select:

c1 = df.V1.isnull() & df.V2.isnull()
c2 = df.V2.notnull()
c3 = df.V1.str.contains(r'^QS|^1111').fillna(False)

df.assign(V3=np.select([c1, c2, c3], ['UNLISTED', 'LISTED', 'UNLISTED'], 'LISTED'))

Output:

   id             V1       V2        V3
0   1            NaN      NaN  UNLISTED
1   2       QSINTSTK      NaN  UNLISTED
2   3  1111GHGKJKIUH  122354H    LISTED
3   4        FGHKLIH  123456K    LISTED
4   5         FDUL12  237899M    LISTED
5   6        VHKIOY3  784236A    LISTED
6   7            NaN      NaN  UNLISTED
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, I will have to choose my constraints

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.