I am working on a web front end + front end services.
I receive good sized csv files (10k lines). My service processes them and condenses them into one larger csv file (up to 300k lines).
This larger file will be turned into an html/pdf report after some extrapolation.
My questions are:
Taking 17,000 files and turning them into 1 takes FOREVER (18 hours last time I tried it). The current process is to take a line of the csv, parse it to see if it exists in my master array, and either create a new entry or add the data to an existing entry in the array. Is there a better way to do this? It seems the last item would take exponentially longer than the first.
Once this large file is created, parsing it seems to take quite a while as well. Should I move away from writing to a csv output and go with JSON for speed of data massaging? or even a lightweight db?
uniqandcatin seconds.