Taking a DIMM View of a Read Cycle
In Figure 1 is shown the timing of a page read cycle to CAS latency two (CL2) PC100 SDRAM through a hypothetical chipset. In this example both the SDRAM and the system bus are operating with 100 MHz clocks.
The read operation takes two clocks to pass through the chipset and propagate to the memory devices mounted on small circuit boards called dual inline modules (DIMMs). The memory devices take two clocks to output the requested data, which then takes another two clocks to pass back through the chipset to the CPU’s system bus. In this case the timing of the operation can be denoted by “7-1-1-1”. The 7 represents a 7 cycle or 70 ns latency for the first memory location accessed in an SDRAM burst while the three 1’s indicate that the 2nd, 3rd, and 4th data words in the burst follow at a rate of one per clock. A total of ten 10 ns clock periods are used to transmit a 32 byte cache line of data to the CPU which gives an effective bandwidth of 320 Mbytes per second. Memory performance can actually exceed this level with a pipelined system interface since sequential memory operations can overlap to a degree.
The preceding example is a page read operation, which is the fastest type of operation in a DRAM. If the desired data isn’t in an open page (row miss), an activate command will need to precede the read command to the memory devices by two clock cycles. The timing of a bank read is 9-1-1-1. If the data is in a DRAM bank that has the wrong page already open (page miss), the bank will have to be sent a precharge command to close the previous page two cycles before the activate command (and four cycles before the read), and is thus a 11-1-1-1 timed operation. It is obviously desirable to perform page operations as often as possible, and an important part of the design of a high performance chipset is keeping the right pages open as often as possible under widely varying patterns of access.
An important thing to remember about SDRAMs is their support for critical word first bursting. When the CPU requests a 32 byte burst of data from memory to place into its cache, it specifies the address of the 64 bit word which is, or contains, the data the CPU requested to cause the cache miss in the first place (the so-called critical word). Most CPU designs will allow the critical word to pass directly to the CPU as well as the affected cache. This allows the CPU to resume program execution even before all 32 bytes of data are loaded into the cache. In terms of the airport baggage carrousel analogy I used in part one of this article, critical word first support is like guaranteeing that your suitcase is the first out of the chute when the carrousel starts up. This is important for achieving consistently low latency.
Be the first to discuss this article!