I have a dataframe with 3 variables (id, V1, V2), I would like to create another variable according to variables V1 and V2 which will be called V3 and which will take 2 modalities ("LISTED" and "UN_LISTED"). Here is my data frame:
print(df)
id V1 V2
0 1 NaN NaN
1 2 QSINTSTK NaN
2 3 1111GHGKJKIUH 122354H
3 4 FGHKLIH 123456K
4 5 FDUL12 237899M
5 6 VHKIOY3 784236A
6 7 NaN Nan
Here is the condition to create V3:
If V1 and V2 is null then it is "UN_LISTED"
If V2 is not null then it is "LISTED"
If V1 is start with "QS" or "1111" then it is "UN_LISTED" otherwise it is "LISTED".
Here is my code :
def label_list (row):
if row['V1'] == np.NaN and row['V2'] == np.NaN:
return 'UN_LISTED'
elif row['V2'] != np.NaN:
return 'LISTED'
elif row['V1'] == "^QS" or row['V1'] == "^(1){4}" :
return 'UN_LISTED'
else :
return "LISTED"
datatest.apply(lambda row : label_list(row), axis = 1)
datatest['V3'] = datatest.apply(lambda row : label_list(row), axis = 1)
But the result is wrong :
print(df)
id V1 V2 V3
0 1 NaN NaN LISTED
1 2 QSINTSTK NaN LISTED
2 3 1111GHGKJKIUH 122354H LISTED
3 4 FGHKLIH 123456K LISTED
4 5 FDUL12 237899M LISTED
5 6 VHKIOY3 784236A LISTED
6 7 NaN Nan LISTED
Thanks for your help