5

I was struggling to load multiple .txt files in to python that are in my desktop. I am totally new to Python. My goal is to load multiple .txt files, which is saved in the same directory. The .txt files are plain texts. Thanks in advance for your help!

4
  • 1
    correction to the question: I wanted to import the data as a data frame Commented Oct 19, 2019 at 19:43
  • 1
    Could you please alaborate on the use of dataframe? Commented Oct 19, 2019 at 19:43
  • Hi Florian, i wanted to do the topic modeling using gensim for each files. That is why I need to import as the dataframe. Commented Oct 19, 2019 at 19:46
  • 1
    I was struggling to load multiple .txt files in to python that are in my desktop. I am totally new to Python. My goal is to load multiple .txt files, which is saved in the same directory. The .txt files are plain texts. I wanted to import the data as a data frame.i wanted to do the topic modeling using gensim for each files. That is why I need to import as the dataframe.Thanks in advance for your help! Commented Oct 19, 2019 at 19:49

3 Answers 3

7

You could do something like this.


from collections import defaultdict
from pathlib import Path
import pandas as df

my_dir_path = "/parh/to/folder"

results = defaultdict(list)
for file in Path(my_dir_path).iterdir():
    with open(file, "r") as file_open:
        results["file_name"].append(file.name)
        results["text"].append(file_open.read())
df = pd.DataFrame(results)
Sign up to request clarification or add additional context in comments.

Comments

2

This might be unnecessarily long but creates another column for the filenames, if you need:

import os
import csv
import pandas as pd
main_folder = 'path\\to\\some_folder'

def get_filename(path):
    filenames = []
    files = [i.path for i in os.scandir(path) if i.is_file()]

    for filename in files:
        filename = os.path.basename(filename)
        filenames.append(filename)
    return filenames

files = get_filename(main_folder)

with open('some.csv', 'w',  encoding = 'utf8', newline = '') as csv_file:
    for _file in files:

        file_name = _file
        with open(main_folder +'\\'+ _file,'r') as f:
            text = f.read()

            writer = csv.writer(csv_file)
            writer.writerow([file_name, text])

df = pd.read_csv('some.csv')


 # ...then whatever...

3 Comments

Hi mulaixi, Thanks so much for your reply. Is the 'some_folder' refers to the path where my files are in? Thanks
Yes, that is the path. I will edit now. But if the @Florian Bernard's answer worked for you, that looks better and shorter.
Thanks for your time. Yes, Florian's answer works fine for me. I will try yours as well.
1

I would do it like this.

import glob

read_files = glob.glob('C:\\your_path_here\\*.txt')

with open('result.txt', 'wb') as outfile:
    for f in read_files:
        with open(f, 'rb') as infile:
            outfile.write(infile.read())

I have 5 text files that look like this:

FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.

The final result looks like this:

FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.
FName,LName,Address
Jim,Bentz,34 Holloway La.
George,Hororitz,76 Ridge Dr.
Eric,Schimtz,11 Main St.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.