Python: concatenating text files

Question

Using Python, I'm seeking to iteratively combine two set of txt files to create a third set of txt files.

I have a directory of txt files in two categories:

text_[number].txt (eg: text_0.txt, text_1.txt, text_2.txt....text_20.txt)
comments_[number].txt (eg: comments_0.txt, comments_1.txt, comments_2.txt...comments_20.txt).

I'd like to iteratively combine the text_[number] files with the matching comments_[number] files into a new file category feedback_[number].txt. The script would combine text_0.txt and comments_0.txt into feedback_0.txt, and continue through each pair in the directory. The number of text and comments files will always match, but the total number of text and comment files is variable depending on preceding scripts.

I can combine two pairs using the code below with a list of file pairs:

filenames = ['text_0.txt', 'comments_0.txt']

with open("feedback_0.txt", "w") as outfile:
    for filename in filenames:
        with open(filename) as infile:
            contents = infile.read()
            outfile.write(contents)

However, I'm uncertain how to structure iteration for the rest of the files. I'm also curious how to generate lists from the contents of the file directory. Any advice or assistance on moving forward is greatly appreciated.

Are the files nes going to be alternate, i.e is it always in text_n followed by comments_n — user15801675
– user15801675, Commented Jul 30, 2021 at 15:47

chepner · Accepted Answer · 2021-07-30 16:10:04Z

2

It would be far simpler (and possibly faster) to just fork a cat process:

import subprocess


n = ... # number of files
for i in range(n):
    with open(f'feedback_{i}.txt', 'w') as f:
        subprocess.run(['cat', 'text_{i}.txt', 'comments_{i}.txt'], stdout=f)

Or, if you already have lists of the file names:

for text, comment, feedback in zip(text_files, comment_files, feedback_files):
    with open(feedback, 'w') as f:
        subprocess.run(['cat', text, comment], stdout=f)

Unless these are all extremely small files, the cost of reading and writing the bytes will outweigh the cost of forking a new process for each pair.

edited Jul 30, 2021 at 16:10

answered Jul 30, 2021 at 16:03

chepner

538k77 gold badges594 silver badges746 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Marco Bonelli Over a year ago

I like considering launching a native program like cat to this more efficiently, but I think it makes the solution unnecessarily non-portable.

Tevien · Accepted Answer · 2021-07-30 15:57:54Z

1

Maybe not the most elegant but...

length = 10
txt = [f"text_{n}.txt" for n in range(length)]
com = [f"comments_{n}.txt" for n in range(length)]
feed = [f"feedback_{n}.txt" for n in range(length)]

for f, t, c in zip(feed, txt, com):
    with open(f, "w") as outfile:
        with open(t) as infile1:
            contents = infile1.read()
            outfile.write(contents)
        with open(c) as infile2:
            contents = infile2.read()
            outfile.write(contents)

answered Jul 30, 2021 at 15:57

Tevien

1412 bronze badges

5 Comments

Marco Bonelli Over a year ago

Why generate all the names beforehand? That is completely unnecessary...

user15801675 Over a year ago

What if there are more thsn 10 files

Tevien Over a year ago

Of course, 10 was just a random number but you clearly choose how many you have. And generating the names beforehand ensures you only use the files you intend to in case the folder is a bit polluted.

chepner Over a year ago

Note that txt, com, and feed could all be generators, which would give you the readability without the cost of storing the names in memory unnecessarily.

chepner Over a year ago

I wouldn't open the files in text mode, though, as that means you'll have to decode and re-encode each line unnecessarily. Just read in the raw bytes and write them back (which is what cat does in my answer).

fsimonjetz · Accepted Answer · 2021-07-30 16:16:21Z

There are many ways to achieve this, but I don't seem to see any solution that's both beginner-friendly and takes into account the structure of the files you described.

You can iterate through the files, and for every text_[num].txt, fetch the corresponding comments_[num].txt and write to feedback_[num].txt as shown below. There's no need to add any counters or make any other assumptions about the files that might not always be true:

import os

srcpath = 'path/to/files'

for f in os.listdir(srcpath):
    if f.startswith('text'):
        index = f[5:-4] # extract the [num] part

        # Build the paths to text, comment, feedback files
        txt_path = os.path.join(srcpath, f)
        cmnt_path = os.path.join(srcpath, f'comments_{index}.txt')
        fb_path = os.path.join(srcpath, f'feedback_{index}.txt')

        # write to output – reading in in byte mode following chepner's advice
        with open(fb_path, 'wb') as outfile:
            outfile.write(open(txt_path, 'rb').read())
            outfile.write(open(cmnt_path, 'rb').read())

Marco Bonelli · Accepted Answer · 2021-07-30 16:25:17Z

The simplest way would probably be to just iterate from 1 onwards, stopping at the first missing file. This works assuming that your files are numbered in increasing order and with no gaps (e.g. you have 1, 2, 3 and not 1, 3).

import os
from itertools import count

for i in count(1):
    t = f'text_{i}.txt'
    c = f'comments_{i}.txt'

    if not os.path.isfile(t) or not os.path.isfile(c):
        break

    with open(f'feedback_{i}.txt', 'wb') as outfile:
        outfile.write(open(t, 'rb').read())
        outfile.write(open(c, 'rb').read())

user15801675 · Accepted Answer · 2021-07-30 15:56:33Z

You can try this

filenames = ['text_0.txt', 'comments_0.txt','text_1.txt', 'comments_1.txt','text_2.txt', 'comments_2.txt','text_3.txt', 'comments_3.txt']
for i,j in enumerate (zip(filenames[::2],filenames[1::2])):
    with open(f'feedback_{i}','a+') as file:
        for k in j:
        with open(k,'r') as f:
                files=f.read()
                file.write(files)

I have taken a list here. Instead, you can do

import os
filenames=os.listdir('path/to/folder')

Collectives™ on Stack Overflow

Python: concatenating text files

5 Answers 5

1 Comment

5 Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

5 Comments

Comments

Comments

Comments

Related