Return to Revisions

2 of 14

added 64 characters in body

edited Aug 31, 2018 at 9:00

If I run more then once the same solution, the execution time will be less and I understand that this happens because some of the data gets cached in the memory hierarchy (ram, or cpu registers etc).

So you should run several times (e.g. run five times exactly the same thing) the same "solution" and benchmark them all. The next question is what timing is the most relevant. You could choose the worst one (probably the first time), or you could consider an average of them, or ignore the worst and best runs and only care about the rest of them, etc...

In your case, I believe you want to consider the average time. In practice, it is very likely that some of the data is already "here" (e.g. in the page cache) when you would really use your program.

I dont think that measuring a "cold" state is realistic. When you split a huge file, it is likely to have been generated (or downloaded, or obtained) a few seconds or minutes ago (why would you wait several hours before splitting it), so you really care more about a "warm" state, and in practice is it likely to be (partly) in your page cache.

Details are obviously computer, operating system, and file system specific.

BTW, on Linux, you might be interested by system calls like posix_fadvise(2) and/or readahead(2). When used properly, they could improve overall performance.

answered Aug 31, 2018 at 8:55

Basile Starynkevitch