0

I could only find the topics reading multiple txt files to one single dataframe. But I want to store them each as a different dataframe ( df1, df2, ... ) and later concate them together to one dataframe. Is there a fast way to do this ? Better what is the fastest way to do this ? That's one big point for me. The data names should not be used, they have the format (year.month.day.hour.minute.second) no txt in the end of the files to find. Thank you in advance. Right now I am just reading and putting in one file:

f in glob.glob("path_in_dir"):
    df = pd.read_table(f, delim_whitespace=True, 
               names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
               dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
                      'D': np.float32,'E': np.float32, 'F': np.float32,
                      'G': np.float32,'H': np.float32})

    all_data = all_data.append(df,ignore_index=True)
2
  • What did you try ? give us some code Commented Jan 2, 2018 at 15:32
  • @ArashHatami please check my edits Commented Jan 2, 2018 at 15:39

3 Answers 3

1

Reconsider this approach: I want to store them each as a different dataframe (df1,df2...) and later concatenate them. Instead, save each similar dataframe in a larger container like list or dictionary. This avoids flooding your global environment with many (potentially hundreds) of separate objects.

Below you have only two objects to maintain: 1) df_dict, with keys being df1, df2, ... and 2) all_data, where all dataframe elements are stacked together.

df_dict = {}

for i, f in enumerate(glob.glob("path_in_dir")):
    df_dict['df'+str(i+1)] = pd.read_table(f, delim_whitespace=True, 
                               names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
                               dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
                                      'D': np.float32,'E': np.float32, 'F': np.float32,
                                      'G': np.float32,'H': np.float32})
# MASTER COMPILED DATAFRAME
all_data = pd.concat(df_dict.values(), ignore_index=True)

# FIRST THREE DATAFRAMES
df_dict['df1'] = ...
df_dict['df2'] = ...
df_dict['df3'] = ...
Sign up to request clarification or add additional context in comments.

1 Comment

your solution is fitting well for my problem. Thank you really!!
0

You could try something like:

import pandas as pd

df = pd.read_csv(r'your_file.txt', sep = '\t')
df2 = pd.read_csv(r'your_second_file.txt', sep = '\t')
df3 = pd.read_csv(r'your_third_file.txt', sep = '\t')

master = pd.concat([df, df2, df3])

Comments

0

I didn't use the exact data structure instead I created few dummy files to perform the use case.

import pandas as pd
import glob

datasets = []
for f in glob.glob("<Path to folder>"):
    df = pd.read_csv(f, sep=',', names=('Col1', 'Col2', 'Col3', 'Col4'), dtype={'Col1':str, 'Col2':int, 'Col3':float, 'Col4':str})
    datasets.append(df)
all_data = pd.concat(datasets, ignore_index=True)
print(all_data.head())

You can manipulate this code to make your code working.

Thanks

1 Comment

@Arsh didn't ask about the source information i.e. dataframe file, that's why I used list of DataFrames instead of using dictionary.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.