1

I need to implement execution time measuring functionality. I thought about two possibilities.

  • First - regular time() call, just remember time each execution step starts and time when each execution step completes. Unix time shell command works this way.

  • Second method is sampling. Every execution step set some sort of flag before execution begins(for example - creates some object in the stack frame), and destroy it when it's completes. Timer periodically scan all flags and generate execution time profile. If some execution step takes more time then the others - it will be scanned more times. Many profilers works like this.

I need to add some profiling functionality in my server application, what method is better and why? I think that second method is less accurate and first method add dependency to profiling library code.

0

4 Answers 4

1

The second method is essentially stack sampling. You can try to do it yourself, by means of some kind of entry-exit event capture, or it's better if there is a utility to actually read the stack. The latter has an advantage in that you get line-of-code resolution, rather than just method-level.

There's something that a lot of people don't get about this, which is that precision of timing measurement is far less important than precision of problem identification.

It is important to take samples even during I/O or other blocking, so you are not blind to needless I/O. If you are worried that competition with other processes could inflate the time, don't be, because what really matters is not absolute time measurements, but percentages. For example, if one line of code is on the stack 50% of the wall-clock time, and thus responsible for that much, getting rid of it would double the speed of the app, regardless of whatever else is going on.

Profiling is about more than just getting samples. Often people are pretty casual about what they do with them, but that's where the money is. First, inclusive time is the fraction of time a method or line of code is on the stack. Forget "self" time - it's included in inclusive time. Forget invocation counting - its relation to inclusive percent is, at best, very indirect. If you are summarizing, the best way to do it is to have a "butterfly view" whose focus is on a single line of code. To its left and right are the lines of code appearing immediately above it and below it on the stack samples. Next to each line of code is a percent - the percent of stack samples containing that line of code. (And don't worry about recursion. It's simply not an issue.)

Even better than any kind of summary is to just let the user see a small random selection of the stack samples themselves. That way, the user can get the whole picture of why each snapshot in time was being spent. Any avoidable activity appearing on more than one sample is a chance for some serious speedup, guaranteed. People often think "Well, that could just be a fluke, not a real bottleneck". Not so. Fixing it will pay off, maybe a little, maybe a lot, but on average - significant. People should not be ruled by risk-aversion.

More on all that.

Sign up to request clarification or add additional context in comments.

Comments

1

When boost is an option, you can use the timer library.

Comments

1

Make sure that you really know what you're looking for in the profiler you're writing, whenever you're collecting the total execution time of a certain piece of code, it will include time spent in all its children and it may be hard to really find what is the bottleneck in your system as the most top-level function will always bubble up as the most expensive one - like for instance main().

What I would suggest is to hook into every function's prologue and epilogue (if your application is a CLR application, you can use the ICorProfilerInfo::SetEnterLeaveFunctionHooks to do that, you can also use macros at the beginning of every method, or any other mechanism that would inject your code at the beginning and and of every function) and collect your times in a form of a tree for each thread that your profiling.

The algorithm for this would look somehow similar to this:

  • For each thread that you're monitoring create a stack-like data structure.
  • Whenever you're notified about a function that began execution, push something that would identify the function into that stack.

    • If that function is not the only function on the stack, then you know that the previous function that did not return yet was the one that called your function.
    • Keep track of those callee-called relationships in your favorite data structure.
  • Whenever a method returns, it's identifier will always be on top of its thread stack. It's total execution time is equal to (time when the last (it's) identifier was pushed on the stack - current time). Pop that identifier of the stack.

This way you'll have a tree-like breakdown of what eats up your execution time where you can see what child calls account for the total execution time of a function.

Have fun!

Comments

1

In my profiler I used an extended version of the 1-st approach mentioned by you.

I have a class which provides context objects. You can define them in your work code as automatic objects to be freed up as soon as execution flow leaves the context where they have been defined (for example, a function or a loop). The constructor calls GetTickCount (it was a Windows project, you can choose analogous function as appropriate to your target platform) and stores this value, while destructor calls GetTickCount again and calculates difference between this moment and start. Each object has unique context ID (can be autogenerated as a static object inside the same context), so profiler can sum up all timings with the same IDs, which means that the same context has been passed several times. Also number of executions is counted.

Here is a macro for preprocessor, which helps to profile a function:

#define _PROFILEFUNC_  static ProfilerLocator locator(__FUNC__); ProfilerObject obj(locator);

When I want to profile a function I just insert PROFILEFUNC at the beginning of the function. This generates a static object locator which identifies the context and stores a name of it as the function name (you may decide to choose another naming). Then automatic ProfilerObject is created on stack and "traces" own creation and deletion, reporting this to the profiler.

2 Comments

2-nd approach is when you have dedicated thread that periodically scan some list of executing contexts and maintain counter for each context. The more counter value is, the longer context execution time.
@Lazin, well, then I misundertood your idea. Need to correct my answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.