Read Multiple txt files to multiple dataframes and concate later all dataframes to one

Question

I could only find the topics reading multiple txt files to one single dataframe. But I want to store them each as a different dataframe ( df1, df2, ... ) and later concate them together to one dataframe. Is there a fast way to do this ? Better what is the fastest way to do this ? That's one big point for me. The data names should not be used, they have the format (year.month.day.hour.minute.second) no txt in the end of the files to find. Thank you in advance. Right now I am just reading and putting in one file:

f in glob.glob("path_in_dir"):
    df = pd.read_table(f, delim_whitespace=True, 
               names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
               dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
                      'D': np.float32,'E': np.float32, 'F': np.float32,
                      'G': np.float32,'H': np.float32})

    all_data = all_data.append(df,ignore_index=True)

What did you try ? give us some code

Arash Hatami
– Arash Hatami

2018-01-02 15:32:07 +00:00
Commented Jan 2, 2018 at 15:32 — Arash Hatami
– Arash Hatami, Commented Jan 2, 2018 at 15:32
@ArashHatami please check my edits

newpyguy
– newpyguy

2018-01-02 15:39:42 +00:00
Commented Jan 2, 2018 at 15:39 — newpyguy
– newpyguy, Commented Jan 2, 2018 at 15:39

Parfait · Accepted Answer · 2018-01-02 16:06:50Z

Reconsider this approach: I want to store them each as a different dataframe (df1,df2...) and later concatenate them. Instead, save each similar dataframe in a larger container like list or dictionary. This avoids flooding your global environment with many (potentially hundreds) of separate objects.

Below you have only two objects to maintain: 1) df_dict, with keys being df1, df2, ... and 2) all_data, where all dataframe elements are stacked together.

df_dict = {}

for i, f in enumerate(glob.glob("path_in_dir")):
    df_dict['df'+str(i+1)] = pd.read_table(f, delim_whitespace=True, 
                               names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
                               dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
                                      'D': np.float32,'E': np.float32, 'F': np.float32,
                                      'G': np.float32,'H': np.float32})
# MASTER COMPILED DATAFRAME
all_data = pd.concat(df_dict.values(), ignore_index=True)

# FIRST THREE DATAFRAMES
df_dict['df1'] = ...
df_dict['df2'] = ...
df_dict['df3'] = ...

your solution is fitting well for my problem. Thank you really!!

kjmerf · Accepted Answer · 2018-01-02 15:50:52Z

0

You could try something like:

import pandas as pd

df = pd.read_csv(r'your_file.txt', sep = '\t')
df2 = pd.read_csv(r'your_second_file.txt', sep = '\t')
df3 = pd.read_csv(r'your_third_file.txt', sep = '\t')

master = pd.concat([df, df2, df3])

answered Jan 2, 2018 at 15:50

kjmerf

4,3853 gold badges24 silver badges29 bronze badges

Comments

Gaurav Singhal · Accepted Answer · 2018-01-02 16:38:01Z

0

I didn't use the exact data structure instead I created few dummy files to perform the use case.

import pandas as pd
import glob

datasets = []
for f in glob.glob("<Path to folder>"):
    df = pd.read_csv(f, sep=',', names=('Col1', 'Col2', 'Col3', 'Col4'), dtype={'Col1':str, 'Col2':int, 'Col3':float, 'Col4':str})
    datasets.append(df)
all_data = pd.concat(datasets, ignore_index=True)
print(all_data.head())

You can manipulate this code to make your code working.

Thanks

edited Jan 2, 2018 at 16:38

answered Jan 2, 2018 at 16:13

Gaurav Singhal

14810 bronze badges

1 Comment

Gaurav Singhal Over a year ago

@Arsh didn't ask about the source information i.e. dataframe file, that's why I used list of DataFrames instead of using dictionary.

Collectives™ on Stack Overflow

Read Multiple txt files to multiple dataframes and concate later all dataframes to one

3 Answers 3

1 Comment

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

1 Comment

Related