So What Is Memory Interleaving?
When you think of computer memory and the way it is arranged, what do you think of? If you imagined a spreadsheet-like arrangement of “cells” then you really need to get out more. Seriously, if you thought of RAM in this manner, then you’re not far wrong – it’s as good an analogy as any. Award yourself one star.
What your chipset does when reading or writing to memory is to place bits (literally) of information in each of the “cells”. Interleaving is just particular way of doing it.
Now we get to the nitty-gritty. How does your chipset arrange the data in memory? The traditional method is just to write (or read) each bit sequentially, i.e. one bit after another. Doesn’t take a lot of intelligence on the part of the chipset.
But is this the best method for performance? If you guessed no, award yourself another star. To understand why you must consider the way memory chips are configured.
Having a physical wire to connect each memory cell to the outside world (i.e. your motherboard) becomes impracticable – there would simply be too many of them. To solve this, memory is made up of rows and columns (the spreadsheet analogy), and you specify (or rather your PC chipset specifies) what row and column you want when you read or write from memory. But using a single ‘array’ has problems when memory density (the size) becomes large. To get around this (and other issues), the cells are divided into multiple arrays, and the arrays arranged into multiple “banks”. Now this is where all this confusion stems from.
What Are You on About Now?
Say you have a 4Mb chip in a 4Mx1 configuration (typical of older FPM and EDO memory). This means that the chip contains a single array with 4 million (actually 4,194,304) cells. If you have ever looked at a memory module and the chips on it, you will have seen the pins on the edges of the chip. These pins are connected to the traces (wires) on the module itself. Here is something you’ll have to take as a given – more pins = more traces (wires). More traces = more cost, therefore more pins = more cost.
One way to access this memory is to have a pin to access each bit, but that’s a lot of pins – in our example chip, 4,194,304 to be exact. Not good. Chips designers and manufacturers are clever however. In order to reduce the number of address pins, you use the row and column numbers (addresses) to select a particular cell in the array. How does this help? To address all of the bits in our example chip, you could arrange the chip into a 1024 x 4096 array. This results in 10 row address pins (2^10 = 1024) and 12 column address pins (2^12 = 4096). 1024 x 4096 = 4,194,304, and all from just 22 pins. But there are more savings possible. You will always have to have a row address followed by a column address, so why not reuse the same address pins, with extra ones that indicates whether the address is for a row or a column. The result is now 14 pins for address purposes (12 address + 1 RAS + 1 CAS). Not a bad saving over the first way of doing it.
There are other pins as well, such as power and signal pins (i.e., write enable, output enable, etc.), and of course the data input/output pins. In this example, there would be only one input and one output pin, as a single array can read/write only a single bit.
This scheme has other advantages. All the bits on the selected row are accessible, and if I wanted another part of memory that was in the same row as the data I just retrieved, access to it is fast because I don’t have to generate a complete address, I just send the changed column address.
Why Don’t I Have to Send the Column Address, and Why is This Important?
Glad you asked. Memory chips contain little devices called “sense amps”. A sense amp is actually a transistor, and there is one sense amp for each column address (4096, in our example above). When the memory controller sends the row address, the entire row is transferred onto the sense amps. The column address just picks out one bit from the entire row. The time taken to send the Column Address Strobe and put the corresponding bit onto the output line is called the CAS latency, and this is why a lot articles and benchmarks place a lot of emphasis on this number.
You may have wondered why the Row Address Strobe latency (called precharge) doesn’t figure as highly (because you sometimes have to generate a new row address as well) in optimizing memory accesses. This is because the memory controller will (if possible) place sequential data in adjacent column addresses. That way, you can just send the sequential column addresses after first generating the row address, or better, internally generate the next address in the chip (called ‘burst mode’). As I explained above, after getting the row onto the sense amps, you just send column addresses avoiding sending the row address (within reason). What’s more, if you know that you will need to use a different row, you can send the command to get the row ready before you have finished sending all the previous column addresses. Even though sending the precharge time takes a few cycles (like the CAS latency), the time to do it has been hidden – therefore there is no (or only a little) impact on performance.
Be the first to discuss this article!