The two factors that are likely to influence performance with a motherboard (and the chipset it uses) are latency and bandwidth. One of the selling points with Double Data Rate memory is that it is a best of both worlds approach – low latency of SDRAM and high bandwidth a la RDRAM. Well let’s do the numbers and see how they translate into the real world.
Latency is how long it takes for the CPU to ask for something in main memory and actually have it ready to use. In this test, lower numbers are better. Cachemem 2.6 was used to measure the latencies, and the worst case results were used.
What does this mean?
I have expressed the memory latency time in nanoseconds, because clock cycle latency times can only be compared when two processors are running at the same clock speed. To eliminate this processor dependence, I converted the latency in clock cycles to absolute times in nanoseconds. These numbers are “worst case scenarios”.
Well it’s fairly obvious that the tweaked up KT7A simply brains the competition. Even the vaunted AMD 760 with it’s “super-bypass” option is humbled – but not by a lot. For those that aren’t familiar with it, “Super-bypass” is not heart surgery but a method that allows a reduction in latencies in the chipset. It is still not enough to conquer the KT133A, however.
The ALi chipset on the KA266 is no where to be seen. ALi apparently has a new revision of their MAGiK1 chipset out (the mysterious B0 stepping) that addresses this issue, but when this will see the light of day is still a closely guarded secret.
But also look at the manufacturers “optimal” settings recorded by the KT7A (and this with CAS 2 memory instead of CAS 3!). The KT7A Default settings aren’t much better than the KA266, so keep that in mind when viewing the later scores with a tweaked KT7A and a default KT7A. Latency (or the lack of it) is the KT133A chipset’s trump card. The DDR boards have bandwidth to play with. Take the KT133A’s latency advantage away and…
Next we will look at memory bandwidth from a couple of perspectives. First we will look at Cachemem’s scores.
As expected, the DDR twins kick the proverbial out of the SDRAM equipped KT7A. Not a 100% gain, but it is a significant difference. However, (isn’t there always an exception?) your processor will, in most circumstances, spend 95%~98% of it’s time accessing data in the Level1 or Level 2 cache, not main memory. So while you see big differences in the numbers generated here, this performance advantage most likely won’t trickle over into huge real-life performance gains.
But this is only Cachemem’s take. As an alternative, I’ve included SANDRA’s Stream memory bandwidth as well. This would have to be one of the most abused benchmarks in existence, and it’s certainly in the top three. A note on this test: Stream is just one type of memory access pattern. It is neither the most prevalent nor typical. But it is one and can be measured. Just because you see huge numbers here, it does not necessarily mean that they will transfer over into real life. Also, according to the SANDRA documentation, SANDRA will try many different access patterns until it arrives at a figure. Those of you who run the benchmark once and record the score are kidding yourselves. By the same token, running the score a few times and recording the highest score as “the score” isn’t representative either. What I do is to run each test 15 times, and report the modal score. For those without a statistical background, the modal score is the score that appears most often. I use this because I believe that it represents an average, typically used score. If I get an even number of modal scores (i.e. two different results occurring with the same frequency), I’ll average the modal scores. If you can think of a better way to present this data, you can take up the issue here. This is why I’ve held off so long in using this benchmark – I have been unable to make sense of the numbers returned in a meaningful way.
The numbers used are straight x86/x87. You can get much higher throughputs with MMX or SIMD routines, but they don’t work on all chips, and they certainly aren’t as prevalent in real-life software – yet. Anyway, here are the results:
Again, the DDR pair steal a lead over the SDRAM equipped KT7A. But not by as much as with the Cachemem scores. In fact, the apples-to-apples comparison between the tweaked KT7A and the tweaked KA266 and 8K7A+ does not show huge gain in scores. The Integer scores are also equal between the latency poor KA266 and the 8K7A+, suggesting that most of the scores are influenced more by cache than absolute main memory performance. Floating-point operations do show a preference for the EPoX and these are the instructions that aren’t influenced as much by cache, so memory access (latency) times are of importance.
As an indicator to the performance achieved, I’ve included the modal results in a table below. In all cases, the modal result seems to occur about one-third of the time.
|Modal result occurrence (out of 15 runs)||5||3,3 (two scores averaged)||3||3,3|
The peak scores aren’t that much greater. On average only about 10MB/s for each.
Be the first to discuss this article!