My program first clusters a big dataset in 100 clusters, then run a model on each cluster of the dataset using multiprocessing. My goal is to concatenate all the output values in one big csv file which is the concatenation of all output datas from the 100 fitted models.
For now, I am just creating 100 csv files, then loop on the folder containing these files and copying them one by one and line by line in a big file.
My question: is there a smarter method to get this big output file without exporting 100 files. I use pandas and scikit-learn for data processing, and multiprocessing for parallelization.


cat *partial*.csv > unified.csv.picklelibrary if you want to easily save arrays, models etc.