Introduction
It is often said that a programmer’s best friend is a good text editor, and that their best tool is their brain. Recently, I had an opportunity to examine and become familiar with a programmer’s second best tool: a good performance monitor.
For many applications, performance can be a critical advantage. Almost any scientific or engineering computing falls into this category, such as crash simulations, or weather modeling. However many desktop applications require good performance: computer games and video encoding are extremely taxing. In the embedded market, performance is likely to be even more important, unlike the general market; systems are not upgraded with faster hardware on a regular basis, and resources are scarce. Many applications, such as cell phones are constrained by thermal and power characteristics, so developers must squeeze out every last bit of performance.
The goal of performance analysis is to understand the behavior of an application on a given platform. In particular, most analysts focus on finding the hot spots in an application, where most of the time is spent, and eliminating any inefficiencies or bottlenecks in the program. Sometimes, this can be as simple as resizing a matrix to fit in the cache, or it can be a vastly more involved process including many different aspects of a program. Once upon a time, it was possible to look at the instructions in a program and figure out exactly how much time it would take to run, and where that time was spent. However, in a project with 50K lines of code, that is likely to be rather difficult, if not impossible. As computer systems have grown complex, developers increasingly rely on performance analysis tools, which supply cold hard facts to programmers, so that they can produce highly optimized and efficient applications.
Intel provides an excellent performance analysis tool called VTune analyzer to help programmers profile their applications and extract the most performance from their code. VTune analyzer comes in both a Linux and Windows flavor, and runs on any of Intel’s architectures: IPF, x86 or XScale. The VTune analyzer has three major functions: event-based sampling, the counter monitor and call graph profiling.