0

I have some csv files that they have different columns , I should merge this files into one file, here is my code:

import os, glob
import pandas as pd
path = ""
all_files = glob.glob(os.path.join(path, "*.csv"))
df_from_each_file = (pd.read_csv(f, sep=',') for f in all_files)
df_merged   = pd.concat(df_from_each_file, ignore_index=True, axis=1)
df_merged.to_csv( "merged.csv")

This code indicates the columns by numbers not their names! What should I do for saving columns names in merged file too?

Thanks for your helps

1 Answer 1

1

This sounds like a direct implementation of one of the Pandas examples for concat(). Copying the relevant example from their documentation:

>>> df1 = pd.DataFrame([['a', 1], ['b', 2]],
                   columns=['letter', 'number'])
>>> df1
  letter  number
0      a       1
1      b       2
>>> df3 = pd.DataFrame([['c', 3, 'cat'], ['d', 4, 'dog']],
                   columns=['letter', 'number', 'animal'])
>>> df3
  letter  number animal
0      c       3    cat
1      d       4    dog
>>> pd.concat([df1, df3], sort=False)
  letter  number animal
0      a       1    NaN
1      b       2    NaN
0      c       3    cat
1      d       4    dog

I usually like to call df.reset_index() on the resulting Dataframe df as well, since having duplicate values in the index can cause unexpected behavior. If you're about to do a join on one of the columns, though, it won't matter.... although you've already got ignore_index=True in your sample code, so you should be fine.

Sign up to request clarification or add additional context in comments.

1 Comment

you might want to add a suffix for the concat columns in the case the column names are the same between one dataframe and another

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.