Direct Rambus DRAM, Part 2 – Operation and Performance

Pages: 1 2 3 4 5 6 7

This is the second installment in a two-part article about Direct Rambus memory technology. The first part included a detailed description of the unique characteristics of Rambus and how it differs from SDRAM. In part 2 I will compare the operation of SDRAM and Direct Rambus based PCs and examine possible impact on processor performance. I will also explain what trickery and deficiencies to look out for when evaluating performance benchmarks used in advocacy for or against Rambus with the help of two examples.

The Role of the PC Chipset

The chipset of a modern personal computer is a set of application specific integrated circuits (ASICs) that serve as the grand central station of data movement within the system. It sits in the heart of the system between the processor and main memory. The chipset translates the protocol of the processor’s system bus into the format and timing required to interface to the DRAMs that comprise the computer’s main memory. Depending on the address, the chipset will also direct processor reads and writes to the control registers and optional memory within input/output devices, such as graphics card via the AGP port and network interface via the PCI expansion bus. The chipset also permits the AGP card and PCI cards to access main memory on their own and perform direct memory accesses (DMAs) independently of the CPU.

Although a memory device might support a page read access time of 20 ns the CPU actually sees a much longer latency than this despite the best efforts of chipset designers. The reason is that chipset sits between the CPU and memory. The read request takes a clock cycle on the system bus to be synchronously relayed to the chipset, after which the chipset will latch the address and control information. The chipset will need at least another clock cycle to compare the address against the hardwired memory map and determine if the read cycle is directed towards main memory, an I/O device on an external bus, or a control register within the chipset itself. A read cycle to main memory then has to arbitrate with possible DMA requests for access to the memory. The translated address and control signals then have to be driven out to the array of memory devices, which takes even more time. After the memory device is accessed and returns the requested read data it then has to be shuttled back through the chipset and sent to the right destination, in this case the CPU system bus.

Notice that I focus exclusively on read cycles. The reason is that write cycles are generally easier to handle in a high performance system. A read cycle implies the initiator is waiting impatiently for the result and latency can be the overriding limitation to higher performance, but a write cycle is a “fire and forget” type of operation. The CPU can blast a cache line write into a 32-byte buffer in the chipset and be on its merry way, and the data in the buffer can sit for many cycles until an appropriate time for transmission to main memory. The only restriction is that special logic is needed to check subsequent read cycles to make sure that any read to addresses targeted by pending writes is either redirected to the data in the buffer, or delayed until after the write buffer is flushed to memory.


Pages:   1 2 3 4 5 6 7  Next »

Be the first to discuss this article!