I have multiple text (.txt
) files saved in a folder. I'm trying to combine them all into a single dataframe. So far I have been able to combine them, but not in the manner I'd like.
The text files (named yob####.txt
where ####
is a year) have information that looks like this:
Jennifer,F,58376
Amanda,F,35818
Jessica,F,33923
Melissa,F,31634
Sarah,F,25755
Heather,F,19975
Nicole,F,19917
Amy,F,19834
Elizabeth,F,19529
Michelle,F,19122
Kimberly,F,18499
Angela,F,17970
I'm trying to open each file, add the year to the end of the row, and move on.
def main():
files = file_paths(FILE_FOLDER) # returns a list of file paths, i.e. ["C:\Images\file.txt","C:\Images\file2.txt", ...]
df = []
for file in files:
year = file.split("\\")[-1][3:7]
df.append(pd.read_table(file)+","+year)
big_df = pd.concat(df, ignore_index=True, axis=1)
big_df.to_csv("Combined.csv", header=False, index=False)
This almost works...except it takes each file and puts the data in a column, the next file in a second column, next file in a third, etc.
The expected output is the same, except when it opens the 1881 file, it adds the info to the end of 1880
. Then 1882
goes after the 1881
data, etc. etc.
big_df = pd.concat(df, ignore_index=True, axis=0)
insteadaxis=0
pushes it up to about 38.9s. and shoots the file size from 38MB to 293MB, and it has lots of "empty columns" (screenshot here)pd.read_table(file, header=None)
and still concatenate withaxis=0