Python populating dataframe in pandas from text files

Question

I have a dataframe in Pandas, let's call it df. It has the following columns:

ID - Which is a column ID number
Files - Which contains a list of filenames

For example:

ID         Files
1       [12, 15, 19] 
2       [15, 18, 103]

And so on. Each element of the list corresponds to a text file with the same name, so "12" corresponds to "12.txt".

What I wanted to do was to create a third column called "Content" that took the text that was in each file in the list, concatenated it all together and put in the column. I was experimenting with for loops, but was wondering if there was a more efficient way to do it.

Thanks.

jezrael · Accepted Answer · 2019-08-17 09:01:44Z

2

Use custom function with Series.apply and read files in pure python (faster like pandas):

import ast

def f(x):
    out = []
    path = 'files/'
    #if necessary convert string repr of lists to lists
    x = ast.literal_eval(x)
    for file in x:
        with open('{}{}.txt'.format(path, file)) as f:
            c = ' '.join(f.readlines())
            out.append(c)
    return ' '.join(out)


df['content'] = df['Files'].apply(f)
print (df)
   ID          Files              content
0   1   [12, 15, 19]        I like pandas
1   2  [15, 18, 103]  like something else

answered Aug 17, 2019 at 9:01

jezrael

867k102 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python populating dataframe in pandas from text files

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related