The risk of overclocking can be divided into two parts, the risk to the processor and system hardware, and the risk of data corruption. There is a definite direct risk to the hardware when overclocking requires cuts and straps to the motherboard or physical handling of jumpers on the motherboard. Modern microprocessors and other components found on the motherboard are easily damaged due to electrostatic discharge (ESD). Proper countermeasures such as the use of a tested wrist strap tied to an appropriate ground should be taken when handling internal components of a computer.
The other physical risk occurs when over-enthusiastic overclockers increase the power supply voltage beyond the maximum value for operation (which is lower than the absolute limit often given in a datasheet). Excessive supply voltages overstress the transistors and can lead to damage due to charge injection from the hot electron effect or damage to the ultra thin gate oxide dielectric from excessive electric field strength. Even if the stress is insufficient to lead to an immediate catastrophic failure, the accumulation of low levels of damage can lead to a drastically shortened device lifetime. No one should attempt to change the supply voltage to their microprocessor without downloading, reading, and understanding the data sheet for their processor. Some hardware Internet web sites offer cookbook approaches to overclocking but the quality of the advice offered can vary greatly. My advice for such “hands-on” overclocking is to never overclock a microprocessor and motherboard you cannot afford to replace. Fortunately, the growing prevalence of programmable clock generators permits many motherboards to be overclocked without ever having to open the case. Also, remember that if you overclock the system bus then you are relying on virtually every chip in your computer to run reliably above spec, instead of just the MPU, so the potential for problems goes up exponentially.
The more insidious and poorly understood danger of overclocking is that of program execution failure and data corruption. A modern microprocessor has millions of transistors and thousands of flip-flops arranged in dozens of finite state machines and synchronous pipelined functional units. The engineers that design and characterize a given microprocessor generally know exactly where the critical paths are in the device, and where timing failures will first occur when the clock rate is increased over the limit. The problem is that if the limit is crossed slightly then there may be no overt signs such as the system failing to boot. The conditions for a timing failure may depend on a the sequence of instructions executed, the data values being manipulated, and even the occurrence of synchronous exceptions such as a cache or TLB miss or asynchronous and/or non-deterministic event such as an interrupt to the processor. A timing error resulting from overclocking may even require a combination of two or more of these conditions to occur simultaneously for an error to manifest itself, and thus occurs very rarely. However, rare is a relative term. A failure that appears only once in a trillion instructions occurs every 17 minutes in a 1000 MIPS processor.
A timing failure may manifest itself either as a spurious change in the control flow in program execution or an error in instruction decoding or effective address generation. Ideally this will induce a segmentation violation fault or illegal instruction trap and alert the user that all is not well. The worst case scenario happens when the control logic within the microprocessor continues processing instructions normally but a data dependent failure occurs in one of the functional units and an incorrect result is calculated. How aggressively you should overclock, or whether you should overclock at all should depend on what you use the system for and how serious an undetected data error is to you. If the worst outcome of data errors is to place you within a wall while playing a 3D game then that is one thing. But if these errors occur while processing your tax return or during professional use of a computer for financial analysis, or calculating the stresses in the wing of an airplane then it is much more serious.
Conclusion
There are some sound and valid reasons to think that many microprocessors can be safely overclocked, sometimes by significant amounts. The problem is that most individuals don’t have access to the multimillion dollar automated testers and painstakingly crafted test found on the production test floors of an Intel or AMD to perform a characterization of their specific processor. So there is no real way for the individual to completely avoid the risk of introducing undetected data corruption to their computing by overclocking.
However, this risk shouldn’t be blown out of proportion. Even microprocessors running with its clock frequency “in spec” can occasionally mangle data due to subtle logic design errors (e.g. the FDIV bug in early Pentiums), undetected manufacturing flaws, or even a rare single event upset soft error due to a random cosmic ray. And computer hardware is a bastion of utter rock solid reliability compared to the seemingly acceptable poor robustness of the operating systems and applications used in personal computer systems.
Discuss (One comment)