0

Using Python, I'm seeking to iteratively combine two set of txt files to create a third set of txt files.

I have a directory of txt files in two categories:

  1. text_[number].txt (eg: text_0.txt, text_1.txt, text_2.txt....text_20.txt)
  2. comments_[number].txt (eg: comments_0.txt, comments_1.txt, comments_2.txt...comments_20.txt).

I'd like to iteratively combine the text_[number] files with the matching comments_[number] files into a new file category feedback_[number].txt. The script would combine text_0.txt and comments_0.txt into feedback_0.txt, and continue through each pair in the directory. The number of text and comments files will always match, but the total number of text and comment files is variable depending on preceding scripts.

I can combine two pairs using the code below with a list of file pairs:

filenames = ['text_0.txt', 'comments_0.txt']

with open("feedback_0.txt", "w") as outfile:
    for filename in filenames:
        with open(filename) as infile:
            contents = infile.read()
            outfile.write(contents)

However, I'm uncertain how to structure iteration for the rest of the files. I'm also curious how to generate lists from the contents of the file directory. Any advice or assistance on moving forward is greatly appreciated.

2
  • 1
    Are the files nes going to be alternate, i.e is it always in text_n followed by comments_n Commented Jul 30, 2021 at 15:47
  • Yes, that's correct. Commented Jul 30, 2021 at 15:50

5 Answers 5

2

It would be far simpler (and possibly faster) to just fork a cat process:

import subprocess


n = ... # number of files
for i in range(n):
    with open(f'feedback_{i}.txt', 'w') as f:
        subprocess.run(['cat', 'text_{i}.txt', 'comments_{i}.txt'], stdout=f)

Or, if you already have lists of the file names:

for text, comment, feedback in zip(text_files, comment_files, feedback_files):
    with open(feedback, 'w') as f:
        subprocess.run(['cat', text, comment], stdout=f)

Unless these are all extremely small files, the cost of reading and writing the bytes will outweigh the cost of forking a new process for each pair.

Sign up to request clarification or add additional context in comments.

1 Comment

I like considering launching a native program like cat to this more efficiently, but I think it makes the solution unnecessarily non-portable.
1

Maybe not the most elegant but...

length = 10
txt = [f"text_{n}.txt" for n in range(length)]
com = [f"comments_{n}.txt" for n in range(length)]
feed = [f"feedback_{n}.txt" for n in range(length)]

for f, t, c in zip(feed, txt, com):
    with open(f, "w") as outfile:
        with open(t) as infile1:
            contents = infile1.read()
            outfile.write(contents)
        with open(c) as infile2:
            contents = infile2.read()
            outfile.write(contents)

5 Comments

Why generate all the names beforehand? That is completely unnecessary...
What if there are more thsn 10 files
Of course, 10 was just a random number but you clearly choose how many you have. And generating the names beforehand ensures you only use the files you intend to in case the folder is a bit polluted.
Note that txt, com, and feed could all be generators, which would give you the readability without the cost of storing the names in memory unnecessarily.
I wouldn't open the files in text mode, though, as that means you'll have to decode and re-encode each line unnecessarily. Just read in the raw bytes and write them back (which is what cat does in my answer).
1

There are many ways to achieve this, but I don't seem to see any solution that's both beginner-friendly and takes into account the structure of the files you described.

You can iterate through the files, and for every text_[num].txt, fetch the corresponding comments_[num].txt and write to feedback_[num].txt as shown below. There's no need to add any counters or make any other assumptions about the files that might not always be true:

import os

srcpath = 'path/to/files'

for f in os.listdir(srcpath):
    if f.startswith('text'):
        index = f[5:-4] # extract the [num] part

        # Build the paths to text, comment, feedback files
        txt_path = os.path.join(srcpath, f)
        cmnt_path = os.path.join(srcpath, f'comments_{index}.txt')
        fb_path = os.path.join(srcpath, f'feedback_{index}.txt')

        # write to output – reading in in byte mode following chepner's advice
        with open(fb_path, 'wb') as outfile:
            outfile.write(open(txt_path, 'rb').read())
            outfile.write(open(cmnt_path, 'rb').read())

Comments

1

The simplest way would probably be to just iterate from 1 onwards, stopping at the first missing file. This works assuming that your files are numbered in increasing order and with no gaps (e.g. you have 1, 2, 3 and not 1, 3).

import os
from itertools import count

for i in count(1):
    t = f'text_{i}.txt'
    c = f'comments_{i}.txt'

    if not os.path.isfile(t) or not os.path.isfile(c):
        break

    with open(f'feedback_{i}.txt', 'wb') as outfile:
        outfile.write(open(t, 'rb').read())
        outfile.write(open(c, 'rb').read())

Comments

0

You can try this

filenames = ['text_0.txt', 'comments_0.txt','text_1.txt', 'comments_1.txt','text_2.txt', 'comments_2.txt','text_3.txt', 'comments_3.txt']
for i,j in enumerate (zip(filenames[::2],filenames[1::2])):
    with open(f'feedback_{i}','a+') as file:
        for k in j:
        with open(k,'r') as f:
                files=f.read()
                file.write(files)

I have taken a list here. Instead, you can do

import os
filenames=os.listdir('path/to/folder')

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.