0

I have a dataframe in Pandas, let's call it df. It has the following columns:

  1. ID - Which is a column ID number
  2. Files - Which contains a list of filenames

For example:

ID         Files
1       [12, 15, 19] 
2       [15, 18, 103]

And so on. Each element of the list corresponds to a text file with the same name, so "12" corresponds to "12.txt".

What I wanted to do was to create a third column called "Content" that took the text that was in each file in the list, concatenated it all together and put in the column. I was experimenting with for loops, but was wondering if there was a more efficient way to do it.

Thanks.

1 Answer 1

2

Use custom function with Series.apply and read files in pure python (faster like pandas):

import ast

def f(x):
    out = []
    path = 'files/'
    #if necessary convert string repr of lists to lists
    x = ast.literal_eval(x)
    for file in x:
        with open('{}{}.txt'.format(path, file)) as f:
            c = ' '.join(f.readlines())
            out.append(c)
    return ' '.join(out)


df['content'] = df['Files'].apply(f)
print (df)
   ID          Files              content
0   1   [12, 15, 19]        I like pandas
1   2  [15, 18, 103]  like something else
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.