Skip to main content
22 events
when toggle format what by license comment
Sep 24, 2018 at 11:04 answer added Mark N Hopgood timeline score: -1
Aug 31, 2018 at 13:03 comment added Basile Starynkevitch Without much more explanation, your question smells badly as some XY problem. So what is the actual problem you want to solve? If you don't want to explain it here, feel free to send me an email to [email protected] mentioning the URL of that question and giving a lot more details
Aug 31, 2018 at 11:30 comment added Basile Starynkevitch @KristiJorgji: still waiting for you to join my chat. But I could leave it in a few minutes. Without a lot more details (file size, number of files handled each day, operating system, file system, kind of applications, .... and general context and motivation) your question is too broad and unclear, and I really am surprised by your performance needs.
Aug 31, 2018 at 10:54 comment added Basile Starynkevitch Please also explain why you need to optimize file splitting that much? Why can't you use existing programs for that task? And what is the typical size in bytes of your files? On what kind of computer (RAM size), of operating system, of file system?
Aug 31, 2018 at 10:47 comment added Basile Starynkevitch Let us continue this discussion in chat.
Aug 31, 2018 at 9:25 comment added Kristi Jorgji Yes. But can we please focus on the question at hand, I am searching for ways to perform tests in the best objective and reproducible way and have the highest confidence. Based on your experience, maybe you can provide me some answer. Thanks in advance
Aug 31, 2018 at 9:21 comment added Basile Starynkevitch Cross-platform code is inconsistent with optimizations. Each platform needs to be tuned differently
Aug 31, 2018 at 9:18 comment added Kristi Jorgji Yes totally, that's why as I provided some insight to the algorithm above I check the bytes. I read full lines until a certain size limit is reached, then I emit it to a file ex gameplays_0.csv, and so on. The input argument is the number of bytes per file, and condition is that must be full rows. Regarding using Linux that is a great idea, but the project is supposed to be cross platform. I can do the tests on Linux though, thanks for the tip
Aug 31, 2018 at 9:18 comment added Basile Starynkevitch At last, I believe that for your problem using Linux is probably more efficient (since Linux has better file systems and better page cache than Windows is rumored to have).
Aug 31, 2018 at 9:17 comment added Basile Starynkevitch But what really matters (assuming lines have reasonable sizes, e.g. thousands of bytes each) is still the byte size of the file. The fact that it is a CSV one is not really important.
Aug 31, 2018 at 9:16 history edited Kristi Jorgji CC BY-SA 4.0
deleted 26 characters in body
Aug 31, 2018 at 9:15 comment added Kristi Jorgji I know, but I am splitten per line basis CSV and other text files for this particular project. Large text files are being split in smaller, big enough to be opened by editors. Bytes are checked, until a file is 50 mb regardless of number of lines then is emitted. Also updated question to remove number of bytes or lines as it is irrelevant to the question, so thanks for the insight.
Aug 31, 2018 at 9:03 comment added Basile Starynkevitch BTW, a file size is in bytes (actually in gigabytes for large files), not in rows.
Aug 31, 2018 at 8:55 answer added Basile Starynkevitch timeline score: 3
Aug 30, 2018 at 17:02 comment added enderland We have solved this problem by running on a dedicated AWS box that can be spun up to run the tests.
Aug 30, 2018 at 14:45 review Close votes
Sep 7, 2018 at 3:05
Aug 30, 2018 at 14:40 history edited Kristi Jorgji CC BY-SA 4.0
added 287 characters in body
Aug 30, 2018 at 14:29 comment added Kristi Jorgji Thank you for the tip. Can you provide me one of these RAM optimizers that can potentially be integrated in a programmatic pipeline. I need to call that optionally by code in a loop any time I clean ram and run the consecutive execution.
Aug 30, 2018 at 14:28 history edited Kristi Jorgji CC BY-SA 4.0
deleted 3 characters in body
Aug 30, 2018 at 14:26 comment added Neil There are RAM optimizers which do little more free up memory not currently in use. You could probably get away with closing the program, running the RAM optimizer, then relaunching.
Aug 30, 2018 at 14:25 history edited Kristi Jorgji CC BY-SA 4.0
deleted 3 characters in body
Aug 30, 2018 at 14:19 history asked Kristi Jorgji CC BY-SA 4.0