Parity & ECC – How They Work
The Need for Error Checking
The original IBM PC required the used of parity memory, since it was designed by engineers familiar with the needs of businesses who used the large mainframe computers. The semiconductors produced at that time were not considered to be as reliable as today’s chips are, and so there existed a need to be sure that every memory access contained accurate data. Businesses such as banks, airlines, stock brokers, etc. all needed to be sure that no errors were introduced by faulty memory chips (hard errors) or by random electronic ‘glitches’ that could alter the data (soft errors).
Apple took a slightly different approach to things. They figured that the average home user of their product really wouldn’t be affected by the occassional random error that might be introduced, and so elected to design their machines to run using non-parity memory modules. This allowed them to reduce the cost of their machines, since non-parity modules require fewer chips. At this time, memory was very expensive, and the elimination of the parity chip reduced the cost by approximately 12% (quite significant when 4MB of memory cost several hundred dollars). IBM PC clone manufacturers soon began to recognize that they could better compete if they provided systems that used non-parity memory, so some 386 machines began to appear with this ‘feature’. When the 486 systems began to be produced, the vast majority of them were using non-parity memory.
To this day almost all systems sold contain non-parity memory unless parity is specifically requested. Only systems that are considered to be handling ‘mission critical’ data will contain parity (or ECC) memory, such as servers. Since the soft error rate for today’s A-grade chips is about once every ten years (or better), it seems to makes sense that non-parity is the norm. In addition, with the majority of systems running Windows95 or Windows98, where data integrity cannot be guaranteed, ECC will really only lessen the probability of a data error. On the other hand, for those using operating systems that are a bit more ‘robust’, memory prices have dropped so significantly that the additional cost of ECC memory usually amounts to about $15.00, assuming a 128MB module.
How Error Checking Works
Parity checking is a rather simple method of detecting memory errors, without any correction capabilities. Basically every byte has a ‘parity’ bit associated with it, for a total of nine (9) bits per byte (eight data bits plus one parity bit). The parity bit is set at write time, and then calculated and compared at read time to determine if any of the bits have changed since the data was stored. This type of checking is limited to detection of single bit errors. If two bits have been altered, the parity check will ‘pass’, and the error is allowed to possibly corrupt the data.
Parity checking can be implemented either as ‘0’ parity or ‘1’ parity. When the byte is stored, the number of zeros (or ones, if ‘1’ parity) is added up. The result is stored in the parity bit – ‘1’ if odd, ‘0’ if even. When that byte is read from memory, the bits are again counted and the result compared against what was stored in the parity bit. A match means that the data was not changed from when it was stored (or two bits were altered so the result is the same).
Since about 90% of all soft errors are of the single bit kind, parity checking is usually quite sufficient for most situations. Unfortunately, there is a penalty to be paid, which is slightly slower performance, since there are extra clock cycles spend in calculating, storing and fetching the parity bit. One other consideration is that since the error cannot be fixed by parity, the application must actually be stopped and an error message issued indicating that a parity error was encountered.
An even better error checking feature is ECC (Error Correction Checking), which includes not only single bit error detection, but also two, three and four bit detection (depending upon the implementation). In addition, ECC can actually correct single bit errors, so the application can continue as if no problem ever occured. ECC can be implemented either on the module (ECC-on-SIMM, or EOS) or in the chipset, however EOS modules are very rare indeed.
ECC is implemented by a ‘hashing’ algorithm that works on eight (8) bytes (64 bits) at a time, and places the result into an 8-bit ECC ‘word’. At read time, the eight bytes being read are again ‘hashed’ and the results compared to the stored ECC word, similar to how the parity checking is performed. The main difference is that in parity checking, each parity bit is associated with a single byte while the ECC word is associated with the entire eight bytes. This means that the bit values for ECC will very likely not be the same as the individual parity bits would be for the same eight byte data value, therefore ECC modules cannot be used in parity mode (however, parity modules can be used in ECC mode, as described a bit later). Note that this description for ECC is based upon a memory bus width of 64 bits. If one were to implement ECC on a 486 (32-bit width), it would require seven (7) bits for the ECC word.
Parity vs. ECC modules
Parity and ECC modules can be used on virtually any motherboard that does not support the parity/ECC feature. Basically the parity bits are ignored (not set nor read). Many early Pentium class chipsets do not have the ability to perform parity or ECC checking, so the feature is always set to ‘disable’ in the BIOS. Note that while SIMMs can be implemented as either non-parity, parity or ECC, DIMM modules come on only two flavors: non-ECC and ECC.
Parity SIMMs can also be used on any motherboard that supports parity or ECC (if implemented in the BIOS correctly, and assuming it will accept SIMMs). Note that there is such a thing as ‘logic’ or ‘bit’ parity, where the parity information is not stored at write time, but is instead generated at read time so that a successful parity check always occurs. Logic parity will not work with the ECC feature, though it will function with the parity feature (you don’t really get any parity checking, however). In fact, this is one way to tell if you do have logic parity (assuming that the board supports ECC properly for true parity modules). When parity modules are used in ECC mode, the algorithm can detect 1- or 2-bit error, and can correct 1-bit errors.
ECC modules can be used on either a non-parity/non-ECC system, or on a system that supports ECC. The ECC module *cannot* be used in parity mode. The reason for this is simply that the ECC module design is such that individual parity bits cannot be set, so the chipset will not write the correct data to the chips which contain the ECC word. In order for ECC modules to work properly, the chipset must be able to handle them and the BIOS must have implemented the feature properly. This was an issue several years ago with the i440HX chipset, as only two manufacturers correctly implemented ECC (Intel and ASUS), however this does not appear to be as much of an issue today.
For the following discussion we will use the letters ‘MB’ to indicate Mega Bytes, and ‘Mb’ to indicate Mega bits. The reasons for this will become apparent as we describe the actual memory module design.
If we were to examine a 16MB parity SIMM, we would see that it has twelve (12) chips on it. Eight (8) of these would be 16Mb chips (remember this is megabits), and four (4) of them would be 4Mb chips. In this design, the 16Mb chips contain the data, while the 4Mb chips contain the parity information. Basically, since a SIMM is required to put out 32 bits at a time (four bytes), the required chip configuration would be 4Mx4 (for the 16Mb chips). What this designation means is that there are four million ‘cells’ which contain four bits each for a total of sixteen million bits on the chip. When the chip is accessed, a single cell is ‘signaled’ by the Row and Column Address Selector lines (RAS and CAS), which then sends it’s data out to the memory bus. This means that each chip delivers 4 bits of data for each access.
Because a Pentium requres sixty-four (64) bits to fill the memory bus, we would need a total of sixteen (16) chips to accomplish this. Since each 16MB SIMM has eight data chips, we need two modules to fill the bus (for DIMMs, we only need a single module, since it is 64 bits wide already). As stated above, each parity chip is a 4Mb chip, which will have a configuration of 4Mx1. Using the explanation of the data chips, this means that each parity chip will output (or store) a single bit at a time – just perfect for parity operations! As you can see, you will have a single 4Mb chip for each pair of 16Mb chips, which explains why there are four of them. The total width of a parity SIMM is 36-bits: thirty-two (32) data bits plus four (4) parity bits
A 16MB ECC module has very much the same structure, except that instead of the four 4Mb chips, it will have one extra 16Mb chip for a total of nine (9) chips. This extra chip will also be a 4Mx4 chip, so it must store and read four bits at a time. Since two SIMMs are required for the Pentium, a total of 8 bits will be available for ECC operations. Note that since the 16Mb chip cannot store a single bit at a time, this module design cannot be used in parity mode. In parity mode the chipset will attempt to write each of the 8 bits individually, and the 16Mb chip simply can’t do it – so you will get a parity error in this situation.
An ECC DIMM module is constructed much the same way as an ECC SIMM module, except that the chips generally have more output pins. For example, a 64MB DIMM will consist of eight (8) chips that are 64Mb each plus one additional 64Mb chip for the ECC bits. These chips will be in an 8Mx8 configuration, so that a total of 64 data bits will be transferred. In addition, the extra ECC chip will output another 8 bits, making the module 72-bits wide.
The bottom line on this is that a true parity module can be used in either non-parity, parity or ECC mode, but it is more expensive than an ECC module. An ECC module can be used as non-parity or as ECC, but not as parity. Since errors are so infrequent with today’s high quality chips (this assumes you have A-grade chips that are not remarked or reused), ECC is worthwhile only for those who use an appropriate OS and that require a high level of data integrity.
Be the first to discuss this article!