Skip to main content

I have a similar problem while trying to solve a very large sliding block puzzle. Currently I have to merge about 100 sorted files, each containing about 60 million positions and occupying 15 gigabytes. The files are individually sorted without duplicates, but different files can have the same record. I

I wrote a utility in C++ which basically opens all the files and reads one one record at a time from each. At each step, it finds the alphabetically earliest one (using the SHELL sort) and writes that record. It reads in the next record from that file and any other files that also had the same record. It ran for 5 hours on a new MAC laptop in order to get the answer. Memory

Memory requirements are not large and each file is only read once. It runs far faster than a comm solution which is limited to two files at a time and involves multiple reading of files.

The program has been compiled and run on two computers: A MAC laptop where the program was originally developed, and a MAC M1. The largest job run so far had 676 files, each with about 60 million records or size 1.5 GB and it ran in a little over 10 hours.

Source code: bruceamos/comb

I have a similar problem while trying to solve a very large sliding block puzzle. Currently I have to merge about 100 sorted files, each containing about 60 million positions and occupying 15 gigabytes. The files are individually sorted without duplicates, but different files can have the same record. I wrote a utility in C++ which basically opens all the files and reads one record at a time from each. At each step, it finds the alphabetically earliest one (using the SHELL sort) and writes that record. It reads in the next record from that file and any other files that also had the same record. It ran for 5 hours on a new MAC laptop in order to get the answer. Memory requirements are not large and each file is only read once. It runs far faster than a comm solution which is limited to two files at a time and involves multiple reading of files.

I have a similar problem while trying to solve a very large sliding block puzzle. Currently I have to merge about 100 sorted files, each containing about 60 million positions and occupying 15 gigabytes. The files are individually sorted without duplicates, but different files can have the same record.

I wrote a utility in C++ which basically opens all the files and reads one record at a time from each. At each step, it finds the alphabetically earliest one (using the SHELL sort) and writes that record. It reads in the next record from that file and any other files that also had the same record. It ran for 5 hours on a new MAC laptop in order to get the answer.

Memory requirements are not large and each file is only read once. It runs far faster than a comm solution which is limited to two files at a time and involves multiple reading of files.

The program has been compiled and run on two computers: A MAC laptop where the program was originally developed, and a MAC M1. The largest job run so far had 676 files, each with about 60 million records or size 1.5 GB and it ran in a little over 10 hours.

Source code: bruceamos/comb

Source Link

I have a similar problem while trying to solve a very large sliding block puzzle. Currently I have to merge about 100 sorted files, each containing about 60 million positions and occupying 15 gigabytes. The files are individually sorted without duplicates, but different files can have the same record. I wrote a utility in C++ which basically opens all the files and reads one record at a time from each. At each step, it finds the alphabetically earliest one (using the SHELL sort) and writes that record. It reads in the next record from that file and any other files that also had the same record. It ran for 5 hours on a new MAC laptop in order to get the answer. Memory requirements are not large and each file is only read once. It runs far faster than a comm solution which is limited to two files at a time and involves multiple reading of files.