2

I am working on a script to extract some details from images. I am trying to loop over a dataframe that has my image names. How can I add a new column to the dataframe, that populates the extracted name appropriately against the image name?

for image in df['images']:
    concatenated_name                    = ''.join(name)
    df.loc[image, df['images']]['names'] = concatenated_name

Expected:

Index images names
0     img_01 TonyStark
1     img_02 Thanos
2     img_03 Thor

Got:

Index images names
0     img_01 Thor
1     img_02 Thor
2     img_03 Thor
3
  • what is the value of name? Commented May 3, 2019 at 2:27
  • integer or string Commented May 3, 2019 at 2:29
  • why do you need a loop, why can't you just use column operations Commented May 3, 2019 at 2:35

2 Answers 2

1

Use apply to apply a function on each row:

def get_name(image):
    # Code for getting the name
    return name

df['names'] = df['images'].apply(get_name)

Follwing your answer that added some more details, it should be possible to shorten it to:

def get_details(filename):
    image = os.getcwd() + filename
    data = pytesseract.image_to_string(Image.open(image))
    .
    .
    . 
    data = ''.join(a) 
    return data

df['data'] = df['filenames'].apply(get_details)
# save df to csv / excel / other
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Shaido! I tried this approach, but couldn't achieve the desired result. So, I kept researching and I believe I finally found a solution. Perhaps, I should have added more information about the issue I was facing. Thanks much!
@Loki: Nice that you could solve it :). I added an extension to my answer here following the information in your answer that should work for you, personally I think it looks a bit cleaner and you don't need to recreate the dataframe.
@Loki: Just noticed the new code didn't actually return a value... that should be fixed.
0

After multiple trials, I think I have a viable solution to this question.

I was using nested function for this exercise, such that function 1 loops over a dataframe of files and calls to function 2 to extract text, perform validation and return a value if the image had the expected field. First, I created an empty list which would be populated during each run of function 2. At the end, the user can choose to use this list to create a dataframe.

# dataframes to store data
df = pd.DataFrame(os.listdir(), columns=['filenames'])
df = df[df['filenames'].str.contains(".png|.jpg|.jpeg")]
df['filenames'] = '\\' + df['filenames']
df1 = [] #Empty list to record details 

# Function 1
def extract_details(df):
    for filename in df['filenames']:
        get_details(filename)

# Function 2
def get_details(filename):
    image = os.getcwd() + filename
    data = pytesseract.image_to_string(Image.open(image))
    .
    .
    . 
    data = ''.join(a) 
    print(filename, data)
    df1.append([filename, data])

df_data = pd.DataFrame(df1, columns=['filenames', 'data']) # Container for final  output
df_data.to_csv('data_list.csv') # Write output to a csv file 
df_data.to_excel('data_list.xlsx') # Write output to an excel file      

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.