merge multiple csvs to one csv

Question

i am trying to merge around 5000 csv sheets to one csv, the structure of the individual csv files are the same, so the code should be simple, however I kept getting an error message of "file not found".

here is the code:

csv_paths = set(glob.glob("folder_containing_csvs/*.csv"))
full_csv_path = "folder_containing_csvs/full_df.csv"
csv_paths -= set([full_csv_path])
for csv_path in csv_paths:
    print("csv_path", csv_path)
    df = pd.read_csv(csv_path, sep="\t")
    df[sorted(list(df.columns.values))].to_csv(full_csv_path, mode="a", header=not 
os.path.isfile(full_csv_path), sep="\t", index=False)
full_df = pd.read_csv(full_csv_path, sep="\t", encoding='utf-8')
full_df

the code resulted error messages as following:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-47-11ffadd03e3e> in <module>
----> 1 full_df = pd.read_csv(full_csv_path, sep="\t", encoding='utf-8')
      2 full_df

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer,
sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, type, 
engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, 
nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, 
infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, 
chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, 
escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, 
low_memory, memory_map, float_precision)
    686     )
    687 
--> 688     return _read(filepath_or_buffer, kwds)
    689 
    690 

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    452 
    453     # Create the parser.
--> 454     parser = TextFileReader(fp_or_buf, **kwds)
    455 
    456     if chunksize or iterator:

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    946             self.options["has_index_names"] = kwds["has_index_names"]
    947 
--> 948         self._make_engine(self.engine)
    949 
    950     def close(self):

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1178     def _make_engine(self, engine="c"):
   1179         if engine == "c":
-> 1180             self._engine = CParserWrapper(self.f, **self.options)
   1181         else:
   1182             if engine == "python":

~/.local/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1991         if kwds.get("compression") is None and encoding:
   1992             if isinstance(src, str):
-> 1993                 src = open(src, "rb")
   1994                 self.handles.append(src)
   1995 

FileNotFoundError: [Errno 2] No such file or directory: 'folder_containing_csvs/full_df.csv'

If they are csv files, why don't you just open('merge.csv','w').write(open('file1.csv').read()+open('file2.csv').read()). If there is a header, then remove the header first. — Bobby Ocean
– Bobby Ocean, Commented Apr 11, 2021 at 22:47

Drakmord2 · Accepted Answer · 2021-04-11 22:44:40Z

1

The paths provided by glob are relative to the script's execution location.

If you have a file structure like this:

~/code/ |
       | merge.py
       | folder_containing_csvs/  |
                                  | file1.csv
                                  | file2.csv

The merge.py file MUST be executed from the /code folder.

e.g.

~/code$ python merge.py

Doing something like

~/$ python ./code/merge.py

Will result in

NotFoundError: [Errno 2] No such file or directory: 'folder_containing_csvs/full_df.csv'

answered Apr 11, 2021 at 22:44

Drakmord2

9447 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

brygid Over a year ago

after moving data to /code folder, the code works perfectly. Thank you for explaining the necessary file structure for using glob

Sid · Accepted Answer · 2021-04-11 22:58:35Z

1

Try this:

loc_path = /path/to/folder/of/csv's
files = os.listdir(loc_path)
files = [file for file in files if '.csv' in file]

# now load them into a list
dfs = []
for file in files:
    dfs.append(pd.read_csv(loc_path+file), sep='\t')

# concat the dfs list:

df = pd.concat(dfs)
# Send this df.to_csv at location of your choice.

Just read the 5000 csv sheets part. How many rows are you expecting?

edited Apr 11, 2021 at 22:58

answered Apr 11, 2021 at 22:52

Sid

4,0758 gold badges37 silver badges77 bronze badges

Collectives™ on Stack Overflow

merge multiple csvs to one csv

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related