how to merge some csv files into one file

Question

I have some csv files that they have different columns , I should merge this files into one file, here is my code:

import os, glob
import pandas as pd
path = ""
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pd.read_csv(f, sep=',') for f in all_files)
df_merged   = pd.concat(df_from_each_file, ignore_index=True, axis=1)
df_merged.to_csv( "merged.csv")

This code indicates the columns by numbers not their names! What should I do for saving columns names in merged file too?

Thanks for your helps

Sarah Messer · Accepted Answer · 2021-11-02 13:38:16Z

This sounds like a direct implementation of one of the Pandas examples for concat(). Copying the relevant example from their documentation:

>>> df1 = pd.DataFrame([['a', 1], ['b', 2]],
                   columns=['letter', 'number'])
>>> df1
  letter  number
0      a       1
1      b       2
>>> df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']],
                   columns=['letter', 'number', 'animal'])
>>> df3
  letter  number animal
0      c       3    cat
1      d       4    dog
>>> pd.concat([df1, df3], sort=False)
  letter  number animal
0      a       1    NaN
1      b       2    NaN
0      c       3    cat
1      d       4    dog

I usually like to call df.reset_index() on the resulting Dataframe df as well, since having duplicate values in the index can cause unexpected behavior. If you're about to do a join on one of the columns, though, it won't matter.... although you've already got ignore_index=True in your sample code, so you should be fine.

you might want to add a suffix for the concat columns in the case the column names are the same between one dataframe and another

Collectives™ on Stack Overflow

how to merge some csv files into one file

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related