I was struggling to load multiple .txt files in to python that are in my desktop. I am totally new to Python. My goal is to load multiple .txt files, which is saved in the same directory. The .txt files are plain texts. Thanks in advance for your help!
-
1correction to the question: I wanted to import the data as a data frameaufd34– aufd342019-10-19 19:43:21 +00:00Commented Oct 19, 2019 at 19:43
-
1Could you please alaborate on the use of dataframe?Florian Bernard– Florian Bernard2019-10-19 19:43:44 +00:00Commented Oct 19, 2019 at 19:43
-
Hi Florian, i wanted to do the topic modeling using gensim for each files. That is why I need to import as the dataframe.aufd34– aufd342019-10-19 19:46:31 +00:00Commented Oct 19, 2019 at 19:46
-
1I was struggling to load multiple .txt files in to python that are in my desktop. I am totally new to Python. My goal is to load multiple .txt files, which is saved in the same directory. The .txt files are plain texts. I wanted to import the data as a data frame.i wanted to do the topic modeling using gensim for each files. That is why I need to import as the dataframe.Thanks in advance for your help!aufd34– aufd342019-10-19 19:49:41 +00:00Commented Oct 19, 2019 at 19:49
Add a comment
|
3 Answers
You could do something like this.
from collections import defaultdict
from pathlib import Path
import pandas as df
my_dir_path = "/parh/to/folder"
results = defaultdict(list)
for file in Path(my_dir_path).iterdir():
with open(file, "r") as file_open:
results["file_name"].append(file.name)
results["text"].append(file_open.read())
df = pd.DataFrame(results)
Comments
This might be unnecessarily long but creates another column for the filenames, if you need:
import os
import csv
import pandas as pd
main_folder = 'path\\to\\some_folder'
def get_filename(path):
filenames = []
files = [i.path for i in os.scandir(path) if i.is_file()]
for filename in files:
filename = os.path.basename(filename)
filenames.append(filename)
return filenames
files = get_filename(main_folder)
with open('some.csv', 'w', encoding = 'utf8', newline = '') as csv_file:
for _file in files:
file_name = _file
with open(main_folder +'\\'+ _file,'r') as f:
text = f.read()
writer = csv.writer(csv_file)
writer.writerow([file_name, text])
df = pd.read_csv('some.csv')
# ...then whatever...
3 Comments
aufd34
Hi mulaixi, Thanks so much for your reply. Is the 'some_folder' refers to the path where my files are in? Thanks
mulaixi
Yes, that is the path. I will edit now. But if the @Florian Bernard's answer worked for you, that looks better and shorter.
aufd34
Thanks for your time. Yes, Florian's answer works fine for me. I will try yours as well.
I would do it like this.
import glob
read_files = glob.glob('C:\\your_path_here\\*.txt')
with open('result.txt', 'wb') as outfile:
for f in read_files:
with open(f, 'rb') as infile:
outfile.write(infile.read())
I have 5 text files that look like this:
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
The final result looks like this:
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.