Import multiple excel files into python pandas and concatenate them into one dataframe [duplicate]

Question

I would like to read several excel files from a directory into pandas and concatenate them into one big dataframe. I have not been able to figure it out though. I need some help with the for loop and building a concatenated dataframe: Here is what I have so far:

import sys
import csv
import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files\excelfiles'
filenames = glob.glob(path + "/*.xlsx")

dfs = []

for df in dfs: 
    xl_file = pd.ExcelFile(filenames)
    df=xl_file.parse('Sheet1')
    dfs.concat(df, ignore_index=True)

Your code in the other question was just fine, just replace read_csv with read_excel. — joris
– joris, Commented Jan 3, 2014 at 16:22
Your code here is not really correct (it was in the other question). You cannot loop over the empty list dfs you just created, so loop iver the filenames, then dfs.append(df) in the loop, and after that pd.concat(dfs, ignore_index=True) — joris
– joris, Commented Jan 3, 2014 at 16:27

ericmjl · Accepted Answer · 2014-01-03 16:33:41Z

As mentioned in the comments, one error you are making is that you are looping over an empty list.

Here is how I would do it, using an example of having 5 identical Excel files that are appended one after another.

(1) Imports:

import os
import pandas as pd

(2) List files:

path = os.getcwd()
files = os.listdir(path)
files

Output:

['.DS_Store',
 '.ipynb_checkpoints',
 '.localized',
 'Screen Shot 2013-12-28 at 7.15.45 PM.png',
 'test1 2.xls',
 'test1 3.xls',
 'test1 4.xls',
 'test1 5.xls',
 'test1.xls',
 'Untitled0.ipynb',
 'Werewolf Modelling',
 '~$Random Numbers.xlsx']

(3) Pick out 'xls' files:

files_xls = [f for f in files if f[-3:] == 'xls']
files_xls

Output:

['test1 2.xls', 'test1 3.xls', 'test1 4.xls', 'test1 5.xls', 'test1.xls']

(4) Initialize empty dataframe:

df = pd.DataFrame()

(5) Loop over list of files to append to empty dataframe:

for f in files_xls:
    data = pd.read_excel(f, 'Sheet1')
    df = df.append(data)

(6) Enjoy your new dataframe. :-)

df

Output:

  Result  Sample
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10
0      a       1
1      b       2
2      c       3
3      d       4
4      e       5
5      f       6
6      g       7
7      h       8
8      i       9
9      j      10

This is certainly OK, but I think the approach in the almost identical question stackoverflow.com/questions/20906474/… to append to a list and then pd.concat(the_list) is cleaner.
Thank, you. I could actually understand this. But why the f[-3:] in the statement : files_xls = [f for f in files if f[-3:] == 'xls']
Glad to be of help! I was where you were about 6 months ago learning Pandas, so I'm glad to be of any help. f[-3:] is me parsing each string. The files list is essentially a list of strings. Therefore, in the list comprehension, I am asking for files (i.e. strings) whose extensions, i.e. the last 3 characters, are "xls".
I am late to this, but I had a small doubt in this case. What if there were multiple sheets in these excel files? How to bring those in as well?
@ManasJani: you can check the docs for pd.read_excel (they are here). There is a sheetname argument that can be used.

zoump · Accepted Answer · 2024-05-28 15:48:52Z

12

There is an even neater way to do that.

# import libraries
import pandas as pd
from glob import glob

# get the absolute paths of all Excel files 
all_excel_files = glob("/path/to/Excel/files/*.xlsx")

# read all Excel files at once
df = pd.concat(pd.read_excel(excel_file) for excel_file in all_excel_files)

edited May 28, 2024 at 15:48

answered Jul 27, 2022 at 7:55

zoump

3413 silver badges9 bronze badges

Comments

serghei · Accepted Answer · 2023-05-08 01:18:01Z

7

You can use list comprehension inside concat:

import os
import pandas as pd

path = '/path/to/directory/'
filenames = [file for file in os.listdir(path) if file.endswith('.xlsx')]

df = pd.concat([pd.read_excel(path + file) for file in filenames], ignore_index=True)

With ignore_index = True the index of df will be labeled 0, …, n - 1.

edited May 8, 2023 at 1:18

serghei

3,4212 gold badges35 silver badges50 bronze badges

answered Jul 2, 2022 at 16:36

rachwa

2,3901 gold badge21 silver badges20 bronze badges

Comments

john blue · Accepted Answer · 2018-02-10 04:25:32Z

6

this works with python 2.x

be in the directory where the Excel files are

see http://pbpython.com/excel-file-combine.html

import numpy as np
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

# now save the data frame
writer = pd.ExcelWriter('output.xlsx')
all_data.to_excel(writer,'sheet1')
writer.save()

answered Feb 10, 2018 at 4:25

john blue

931 silver badge5 bronze badges

Comments

Abhilash Ramteke · Accepted Answer · 2021-04-11 14:55:00Z

1

This can be done in this way:

import pandas as pd
import glob

all_data = pd.DataFrame()
for f in glob.glob("/path/to/directory/*.xlsx"):
    df = pd.read_excel(f)
    all_data = all_data.append(df,ignore_index=True)

all_data.to_csv("new_combined_file.csv")

answered Apr 11, 2021 at 14:55

Abhilash Ramteke

4461 gold badge6 silver badges15 bronze badges

Comments

Dan · Accepted Answer · 2022-03-23 11:17:54Z

1

#shortcut

import pandas as pd 
from glob import glob

dfs=[]
for f in glob("data/*.xlsx"):
    dfs.append(pd.read_excel(f))
df=pd.concat(dfs, ignore_index=True)

answered Mar 23, 2022 at 11:17

Dan

463 bronze badges

Comments

NUTAKKI PRADEEP CHAKRAVARTHI · Accepted Answer · 2019-08-24 16:58:16Z

import pandas as pd

import os

os.chdir('...')

#read first file for column names

fdf= pd.read_excel("first_file.xlsx", sheet_name="sheet_name")

#create counter to segregate the different file's data

fdf["counter"]=1

nm= list(fdf)

c=2

#read first 1000 files

for i in os.listdir():

  print(c)

  if c<1001:

    if "xlsx" in i:

      df= pd.read_excel(i, sheet_name="sheet_name")

      df["counter"]=c

      if list(df)==nm:

        fdf=fdf.append(df)

        c+=1

      else:

        print("headers name not match")

    else:

      print("not xlsx")


fdf=fdf.reset_index(drop=True)

#relax

aruna kumar · Accepted Answer · 2020-05-26 18:52:16Z

import pandas as pd
import os

files = [file for file in os.listdir('./Salesfolder')]
all_month_sales= pd.DataFrame()
for file in files
    df= pd.read_csv("./Salesfolder/"+file)
    all_months_data=pd.concat([all_months_sales,df])
all_months_data.to_csv("all_data.csv",index=False)

You can go and read all your .xls files from folder (Salesfolder in my case) and same for your local path. Using iteration through whcih you can put them into empty data frame and you can concatnate your data frame to this . I have also exported to another csv for all months data into one csv file

Collectives™ on Stack Overflow

Import multiple excel files into python pandas and concatenate them into one dataframe [duplicate]

8 Answers 8

5 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

5 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related