0

I have a dataframe with 4 columns:

c1        c2        c3      GName
0.221445  0.300534  5.689   KDD
0.001000  0.969000  15.140  ACC
1.000000  0.094000  -0.245  QETF

And dataframe called file of one column:

GName
Abd
kkoew
KDD
pwqh
ACC
dsewf

I need to add new column call label that based on checking the scores in c1, c2 and c3 and GName

So, if the majority of the 3 scores agreed on their conditions (2 out of the 3 or all the 3) and the value of GName exist in the dataframe file; the label = 1, otherwise the label = 0

The conditions of c1 should be > 0.95
c2 should be > 0.50
c3 should be > 15

The output will be like this:

c1        c2        c3      GName label
0.221445  0.300534  5.689   KDD   0  (because 0 out of 3 and KDD in file)
0.001000  0.969000  15.140  ACC   1  (because 2 out of 3 and ACC in file)
1.000000  0.94060  -0.245  QETF   0  (because 2 out of 3 but QETF not in file)

I'm struggling with those different conditions, any help please?

1 Answer 1

1

The way I would do it is this:

import pandas as pd

df = pd.DataFrame({'c1':[0.221445, 0.001000, 1.000000],
                   'c2':[0.300534, 0.969000, 0.094000],
                   'c3':[5.689, 15.140, -0.245],
                   'GName':['KDD', 'ACC', 'QETF']})
file = pd.DataFrame({'GName':['KDD', 'ACC']})

conditions = (df['c1'] > 0.95).astype(int) + (df['c2'] > 0.5).astype(int) + (df['c3'] > 15).astype(int)
conditions = (conditions >= 2) & (df['GName'].isin(file['GName']))
df['label'] = 0
df.loc[conditions, 'label'] = 1

>>> df
         c1        c2      c3 GName  label
0  0.221445  0.300534   5.689   KDD      0
1  0.001000  0.969000  15.140   ACC      1
2  1.000000  0.094000  -0.245  QETF      0

It would be nice if you could include code to generate your dataframe in your question, as well.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.