I am writing a software that splits big files (at least 1 billion rows) into smaller files and I have coded several solutions.
I am measuring the execution time of each of my solutions. (with threads, go routines, MPI etc) and want to objectively compare them.
If I run more then once the same solution, the execution time will be less and I understand that this happens because some of the data gets cached in the memory hierarchy (ram, or cpu registers etc).
I want to make the tests as more objective and reproducible as possible by removing these influences. I want to run each test with a clean slate.
If I restart the PC and measure performance again, then the ram is empty from the previous data and results are quite ok. I wonder if there is any way to do it without having to restore the PC?
What is the best way to make this kind of tests ? I want to do something like: Run a.exe and measure time, clean all ram, cpu register anything cached about this data Repeat N number of times test for a Do same thing for b,exe
Then I can calculate the average speed of a, the average speed of b and finally compare the data.
Please provide me some info as I am researching a lot and could not find any helpful resource. Optionally, I need programmable ways to achieve this. Some way to integrate in the benchmark pipeline the additional tools.
What I have tried so far:
- Restart PC just to make the point that caching was the issue
- Run the software inside a docker container, each time. Was good but very slow
Thanks in advance !