What Does Higher Latency Do to Performance?
A Simple Model
The big question computer buyers want to know is how does the choice of SDRAM vs DRDRAM memory affect performance. To help answer this I created a simple performance model based around a hypothetical 800 MHz CPU with an architectural average clocks per instruction (CPI) figure of 0.5. That is, a CPU which can average two instructions per clock cycle running strictly out of L1 caches. This CPU has 32 Kbyte L1 caches, and either 256 Kbyte or 128 Kbyte on-chip L2 cache with 6 clock cycle latency. My spreadsheet model is summarized in the table below:
Memory Type |
Bus Freq (MHz) |
On-chip L2 (KB) |
Avg DRAM Access (CPU clks) |
Avg Mem Access (CPU clks) |
Average CPI |
Average MIPS |
Average DRAM BW (MB/s) |
Normalized Performance (PC100 = 1.0) |
|
100 |
256 |
73.8 |
0.505 |
0.753 |
1063 |
109 |
1.00 |
|
133 |
256 |
67.2 |
0.474 |
0.737 |
1086 |
111 |
1.02 |
|
133 |
256 |
61.2 |
0.445 |
0.722 |
1107 |
113 |
1.04 |
|
133 |
256 |
85.2 |
0.560 |
0.780 |
1026 |
105 |
0.97 |
|
133 |
256 |
91.2 |
0.589 |
0.794 |
1007 |
103 |
0.95 |
In my model the L1 hit ratio is 97%, the L2 hit ratio is 84% and 78% respectively and the main memory page hit ratio is 55%. These hit ratios are taken from a 1998 presentation by Forrest Norrod, senior director, Cyrix Corp. entitled “The Future of CPU Bus Architectures – A Cyrix Perspective”. The column marked ‘average DRAM access’ refers to average critical word first latency in CPU clocks, plus 6 cycles for the L2 miss, and an extra cycle for data forwarding. The average latency is calculated based on 55% page hits, 22.5% row hits, and 22.5% page misses. The column labeled ‘average memory access’ is the average DRAM access multiplied by the L1 and L2 cache miss rates. The average CPI is calculated by adding the 50% of the average access time (since about half of x86 instructions perform a data memory access) to the base architectural figure of 0.5 CPI. The average MIPs is calculated by dividing 800 MHz by the average CPI figure. The average DRAM BW figure is derived as the product of MIPs x 50% data accesses x L1 miss rate x L2 miss rate x 32 bytes per cache line x 1.33 (66% reads, 33% writes with write miss allocate policy selected).
Although this simplistic model ignores many second order effects (out of order execution, I-cache misses, hit under miss caches, read-to-write and write-to-read switch over effects, refresh, DMA interference etc.), it is still useful to illustrate how all these design parameters interact for representative PC-type applications. The bottom line is that regardless of memory type, main memory latency is nearly two orders of magnitude larger than processor clock period, and an effective cache hierarchy is needed for good performance. In my example, DRDRAM has an average read latency over 20% greater than PC100 SDRAM, yet the CPU performance is only reduced by about 5% and 7% for a 256 Kbyte and 128 Kbyte L2 cache, respectively.
Of course individual programs can and will vary greatly in how memory characteristics impact their performance. A program that chases long chains of linked list records through a large memory footprint will thrash the caches, and the low latency of SDRAM will really shine. On the other hand large sequential memory transfers with little computation can easily saturate SDRAM bandwidth, and Direct Rambus will have an advantage (largely due to the faster system bus). For code that plays nicely within the caches, the memory type will have virtually no impact at all.
Pages: « Prev 1 2 3 4 5 6 7 Next »
Be the first to discuss this article!