Isolating the Data Prefetch Performance Benefits
As part of my investigation into various benchmarks, I have been trying to acquire some products that would allow me to isolate specific differences between the various architectures and implementations. Recently, I was fortunate enough to get my hands on an unlocked Coppermine core Pentium III through a private party, and was surprised and delighted to find that it would operate fairly reliably up to 1.2GHz. I realized that with the correct motherboard, I might be able to isolate the benefit of the Data Prefetch feature present on the Tualatin core Pentium III and the Pentium 4. This is possible because the only difference between the Tualatin and the Coppermine cores is this feature.
One of the biggest problems facing someone trying to isolate the performance differences between two parts is to ensure that only those two parts change between tests. Many types of comparisons do not allow for this, such as tests of two different memory types or CPU architectures. Fortunately, recent chipsets from VIA and Intel support both of these processor cores, so that problem was eliminated.
One other potential issue in determining the actual benefit of the Data Prefetch logic is the theoretical maximum bandwidth capability of the platform. The Pentium III has a 64-bit GTL+ bus, which has a maximum theoretical bandwidth of 1.06GB/s at 133MHz, and 800MB/s at 100MHz. The question is whether any benchmark or application can saturate the bus sufficiently to prevent the Data Prefetch logic from achieving its maximum potential. As you will see in the results, this appears to not be an issue except under extreme artificial circumstances, so the results should be applicable to other implementations, such as on the P4. The reason I say this is that there is an excellent chance that Pentium 4 implements the same basic logic as the Pentium III, and it can probably be assumed that AMD’s implementation is very similar. A short discussion on this subject can be found through this link to the RWT Forum – Technical Room.
While the information gained here might seem only mildly interesting, it is an important bit of data to have when trying to identify precisely why one processor performs differently than another that is a similar or even a very different architecture. If one were able to quantify the benefit of every feature, one could also predict with some accuracy the performance differences of yet unreleased parts. Of course, that isn’t too feasible, but the fact remains that more information means more accurate analysis and prediction. For example, later we might be able to use it to help determine the performance benefit of various features in other processors by subtracting the gain provided by the Data Prefetch logic (such as between the Thunderbird and Palomino core Athlons).
During my testing, I also decided to extend things a bit and ‘prove’ that the Tualatin core Celeron does not include the Data Prefetch logic, as has been claimed in some articles. An Intel representative had told me this when the Celeron was first introduced, but it seems that many people still believe it is a ‘true’ Tualatin.
Be the first to discuss this article!