Evaluating the Model
The good news is that our model seems to fit the data for notebook graphics quite well. However, the real test of our model is how well it can predict performance for other GPUs. The slope of the regression lines indicates how much additional performance we can expect from an additional GFLOP/s of theoretical compute capability. For Nvidia, the slope is ~14, while it is ~6 for AMD. Using this information and the readily available GFLOP/s, we can estimate performance for other GPUs with similar microarchitectures. One of the other reasons to use GFLOP/s for our model is that it is easy to calculate for most graphics cards and is not subject to any fudging from manufacturers (aside from Nvidia’s missing multiply, which we do not count at all).
To evaluate the accuracy of our predictions, we compare the model against the measured performance for several different desktop GPUs. We were limited in our selection of cards, since the notebookcheck.net database is very sparse for desktops. There were 7 results to check our model against; 5 from Nvidia and 2 from AMD. Table 1 shows the accuracy of our predictions.
Table 1 – Performance Estimates and Accuracy for Desktop GPUs
Overall our models are fairly accurate for making predictions, especially in the case of Nvidia GPUs. For most graphics cards the accuracy is within 3% for Nvidia. Even the GTX 460 is off by a very acceptable 7%. In the case of AMD, the results are a little less accurate. The 5850 looks fairly good, although the HD 4850 prediction was off by 15%. That’s a little higher than ideal, but still reasonable, given the simplicity of the model. The bottom line is that our model appears to be relatively accurate for making predictions.
The model also tells us a bit about the differences between Nvidia and AMD GPUs and drivers. Recall that the peak GFLOP/s of a GPU basically is the cores times the frequency times the peak IPC. Going back to our performance equation, a little arithmetic tells us that the slope of the regression lines is essentially the utilization of a core (i.e. actual IPC over peak IPC) divided by the instruction count (IC). So the slope of the regression line actually tells us about the efficiency of AMD and Nvidia’s drivers and microarchitectures. Relatively speaking, Nvidia’s architecture seems to be 2.2X the utilization of AMD’s architecture. This corresponds fairly well to the utilization that AMD has described when disclosing the new VLIW4 design and highlights the differences in both the microarchitecture and drivers.
We have built a fairly simple, yet highly accurate model of GPU performance. Our analysis has shown that the peak single precision GFLOP/s is a very good proxy for performance of a given GPU microarchitecture on 3DMark Vantage. The models we have described can estimate performance for 3DMark Vantage, or other similar workloads. However, each workload is different and may stress other aspects of the GPU architecture. 3DMark Vantage is distinct from 3DMark11, so the two probably require different performance models (although the same techniques can be used). Workstation tests like SPECviewperf, which uses OpenGL rather than DirectX, may be even more diverse.
It’s also worth mentioning that real games and applications like Civilization 5, League of Legends, Crysis or Metro 2033 are fairly different than the more synthetic benchmarks like 3DMark. This is doubly true given the extent to which vendors are known to use their drivers to optimize (or sometimes outright cheat) on benchmarks like 3DMark. In many cases, drivers are much more heavily optimized for big name games and benchmarks, while neglecting smaller titles. So the relationship between GFLOP/s and performance is likely to be somewhat different for real applications. But the differences between benchmarks and real applications are ultimately a topic for another day.
Our analysis could also be extended to deal with integrated GPUs and general purpose applications. However, that would probably require more complicated models that factor in memory bandwidth, caching and other subtleties. In the case of integrated graphics, the hardware is much more complex because resources are typically shared with the CPU. General purpose applications are also tricky because they may have limited parallel scaling, and fairly different bottlenecks than graphics workloads.
Our performance models can also be used to compare different microarchitectures and judge how various improvements ultimately impact performance. For instance, we did not have sufficient data to evaluate AMD’s Cayman, but as more results emerge it should be easy enough to create a good model and compare it against previous generations. Similarly, we could compare the different microarchitectures that Nvidia has used to look for subtle differences. All this gives us a fair bit of future opportunties for analysis. But for the moment, it is enough to know that something as complicated as a modern GPU can be accurately modeled in a relatively straight forward fashion.