0

I have 12 large csv files with same structure. I would like to combine all the csv files into single csv file. Don't repeat the headers. Now I am using shutil as follows.

import shutil
import time
csv_files = ['file1.csv', 'file2.csv', 'file3.csv', 'file4.csv', 'file5.csv', 'file6.csv']

target_file_name = 'target.csv';
start_time = time.time()
shutil.copy(csv_files[0], target_file_name)
with open(target_file_name, 'a') as out_file:
    for source_file in csv_files[1:]:
        with open(source_file, 'r') as in_file:
            in_file.readline()
            shutil.copyfileobj(in_file, out_file)
            in_file.close()
    out_file.close()
print("--- %s seconds ---" % (time.time() - start_time))

Edit

When I tried time cat file[1-4].csv > BigBoy command in the terminal I got the following output. 0.08s user 4.57s system 60% cpu 7.644 total. That is cat command took about 4.5 seconds, but Python program took 17.46 seconds. I used 4 csv files, each having 116MB size.

I would like to know, if any other methods are there in Python, to handle these scenario more efficiently. You could download large csv files from here.

6
  • Yes, efficiently. I edited my post. Thanks Commented Apr 20, 2020 at 9:03
  • 1
    I tried the code snippet with 4 csv files of size 116MB. It took 17.46 seconds. I would like to know if any other library/methods are there which handles file operations more efficiently. Commented Apr 20, 2020 at 9:05
  • 1
    Try the shell to see how fast your disks are time cat file[1-4].csv > BigBoy Commented Apr 20, 2020 at 9:12
  • about 4.5 seconds 0.08s user 4.57s system 60% cpu 7.644 total Commented Apr 20, 2020 at 9:22
  • 1
    I won't post it as an answer because you asked for a Python solution, but it seems the fastest way to get the job done is to start a subprocess and run the following in order not to repeat the headers awk '(FNR>1)||(NR==1)' file1.csv file2.csv file3.csv... Commented Apr 21, 2020 at 11:32

1 Answer 1

2

Better use csvstack from csvkit for this. There is also a lot of other stuff to work with csv files from console.

csvstack file1.csv file2.csv ...
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.