1

I am trying to write a condition check for tagging Technical terms. I have used a dictionary to look up to and do a fuzzy match. My dataframe is something like this-

   Word      Entity     Score   NER_Tag technology  similarity
Stonetrust      CRR     0.90     MISC    xxx         90
Wilkes          CRR     0.80     ORG     xxx         60
linux           xxx     0.70     LOC     xxx         70
SILVER  INC     xxx     0.88     PER     xxx         80
PO BOX 988      xxx     0.99    MISC     xxx         70
LA 70520        xxx     0.67     PER     xxx         50
02/12/2019      xxx     0.23     MISC    xxx         100

I need to check for below condition and create a new column with final tags-

  1. if similarity score = 100 then final_tag = TECH
  2. if Tag = MISC and similarity score >=95 then final_tag = TECH

To do this I did wrote below code

filter1 = df1['similarity'] == 100
filter2 = (df1['NER_Tag'] == 'MISC') & (df1['similarity'] >= 95)

df1['Final_NER']  = np.where(filter1, filter2, 'TECH', df1['NER_Tag'])

I am not getting correct output and getting below error-

TypeError: where() takes from 1 to 3 positional arguments but 4 were given

Is there a better way of writing this logic?

0

1 Answer 1

1

You are close, need numpy.select if want pass multiple values per multiple conditions:

df1['Final_NER']  = np.select([filter1, filter2], ['TECH', 'TECH'], default=df['NER_Tag'])

Or use | for bitwise OR between both conditions is simplier here:

df1['Final_NER']  = np.where(filter1 | filter2, 'TECH', df1['NER_Tag'])
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.