I have a folder that contains multiple excel files
column B.xlsx
column A.xlsx
column C.xlsx
...
**These aren't the actual files names. The actual files names are more specific than this
Each excel file contains data for a single column in a larger dataframe I want to create. The files are formatted like so
column A.xlsx:
Date | ID | Mass | Units
1/21 A 5.10 g
2/21 B 5.12 g
3/21 C 5.11 g
column B.xlsx:
Date | ID | Mass | Units
1/21 A 6.10 g
2/21 B 6.12 g
3/21 C 6.11 g
The large dataframe I'd like to create would look like this:
ID | Column A | Column B | Column C|....
A 5.10 6.10
B 5.12 6.12
C 5.11 6.11
Its important that the data is assigned to the correct columns but the only indication as to which column the data corresponds to is in the file name.
I wrote this code which does the job but there has to be a better way
files=glob.glob(r"C:\my\directory/*.xlsx")
bigDF=pd.DataFrame(columns=["ID","A","B","C"])
temp=pd.read_excel(files[0])
bigDF["ID"]=temp["ID"]
for f in files:
temp=pd.read_excel(f)
if "A" in f:
bigDF["A"]=temp["Mass"]
elif "B" in f:
bigDF["B"]=temp["Mass"]
elif "C" in f:
bigDF["C"]=temp["Mass"]