Performance Analysis Tools: A Look at VTune

Pages: 1 2 3 4 5

Call Graph Profiling

Call graph profiling has a much different focus than counter monitor or event sampling The goal of call graph profiling is understand the control flow behavior of an application at the function level. Call graph profiling (CGP) uses binary instrumentation to record caller/callee relationships between functions as well as timing information. Binary instrumentation has quite a bit of overhead, so VTune analyzer can control the extent to which the modules are instrumented. The more complete the instrumentation, the better the information and the more overhead. Binary instrumentation can work for any modules in Ring 3 (application space), but will not function with Ring 0 modules (i.e. kernel space). CGP requires debug information from the application to be profiled, rather than the source code, so it can be used on third party software, even if the vendor won’t part with the source.

Each thread in the profiled application is displayed as a tree of functions in CGP. For each node (i.e. function) in the tree, VTune analyzer records the number of calls, the time spent in the function (both including and excluding child functions) and the time spent waiting when the function is blocked (again both including and excluding child functions). Note that the time spent is an approximation, rather than an exact figure. Figure 3 below shows a sample call graph for a 64 bit version of Internet Explorer. Unfortunately, the call graph capabilities cannot be used with non-native applications (i.e. you cannot use it on 32 bit applications, like Oblivion, when using a 64 bit OS).


Figure 3 – Sample Output from Call Graph Profiling

This information then can be used to identify a critical path of execution for an application and the most frequently used functions. This will not tell a programmer how to optimize, but it will quite clearly indicate where to optimize, which is just as important. For developers, time is the scarcest and most precious asset, and CGP shows developers where their time will be the most productive.

Pages: « Prev   1 2 3 4 5   Next »

Discuss (9 comments)