0

My json looks like below:

json_obj = [{'extracted_value': {'other': 'Not found', 'sound': 'false', 'longterm': 'false', 'physician': 'false'}, 'page_num': '33', 'score': '0.75', 'number': 12223611, 'misc':'true'}]

df=pd.DataFrame(json_obj)[['extracted_value', 'page_num','conf_score','number']]

I am extracting only the above info. But now i wanted to ignore 'other': 'Not found' in the extracted_value column and extract like above values.

1 Answer 1

1

you can try df['extracted_value'].apply(remove_other) i.e apply a function on column extracted_value.

complete code will be:

json_obj = [{'extracted_value': {'other': 'Not found', 'sound': 'false', 'longterm': 'false', 'physician': 'false'}, 'page_num': '33', 'score': '0.75', 'number': 12223611, 'misc':'true'}]
df=pd.DataFrame(json_obj)[['extracted_value', 'page_num','number']]

def remove_other(my_dict):
    return {e:my_dict[e]  for e in my_dict if  e != 'other' and my_dict[e] != 'Not Found' } # condition to remove other and not found pair
    
df['extracted_value']=df['extracted_value'].apply(remove_other)

and the result will be:

extracted_value                                        page_num number
0   {'sound': 'false', 'longterm': 'false', 'physi...   33      12223611

additional response:

  1. df['extracted_value'].apply(remove_other) implies that column value will be passed as a parameter to the function. you can put print statement print(my_dict) in the remove_other to visualize it better.

  2. code can be changed to remove dictionary value from and condition.

def remove_other(my_dict):
    return {e:my_dict[e]  for e in my_dict if  e != 'other' }#and my_dict[e] != 'Not Found' } # remove'other' key item 
    

i would suggest getting familiarized with JSON. in this case , need to go to [0]['coord'][0] . so function will be like :

# Section_Page_start and Section_End_Page
def get_start_and_end(var1):
    my_dict=var1[0]['coord'][0]
    return {ek:my_dict[ek] for ek in my_dict if ek in ['Section_Page_start','Section_End_Page']}
Sign up to request clarification or add additional context in comments.

12 Comments

1. where is my_dict? is it json_obj ? 2. And 'other': 'Not found' may not be same all the time sometimes 'other': 'khjkmlkmk', It is possible in many cases. I need to completely ignore the other and the value.
my_dict looks like this :my_dict {other: null, socsec: null, longterm: null, physician: null}. But i am getting an error as TypeError: string indices must be integers. with the above code.
pls add some data samples from your dataframe. the dictionary you posted does not seem right for the dictionary of string keys. for example, the key needs to be in quotes like {'other': null, 'socsec': null,'longterm': null, 'physician': null}
My dict key looks like this double quotes. {"other": null, "socsec": null, "longterm": null, "physician": null}. Tried to remove the double quotes and apply remove_other. But now i am unbale to replace double quotes to single quote using str.replace() My sample dict: {"other": null, "socsec": true, "longterm": null, "physician": null} {"other": null, "socsec": true, "longterm": null, "physician": null} {"other": null, "socsec": null, "longterm": null, "physician": true}
I replaced double quote to single quote now. df_merge['pdh_value'] = df_merge['pdh_value'].str.replace('[\"]','\'') Now my_dict looks like this: {'other': null, 'socsec': null, 'longterm': null, 'physician': null} Still am getting the below error: TypeError: string indices must be integers
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.