Intel Enjoys Christmas in June
The part of the Alpha IP treasure chest now in Intel’s sweaty hands that will eventually hurt IBM and Sun the most lies outside the processor core. After the tremendous effort to bring the powerful and complex EV6 processor core to life three years ago, much of the Alpha design group’s focus shifted into conceiving and creating the scalability machinery that surrounds the CPU in the EV7 device. The EV7 dispenses with the centralized system bus arrangement found in nearly all other high end MPUs in favor of separate glueless communication paths to local memory, I/O bridge ASIC(s), and up to four other EV7 processors. The EV7 has a total chip bandwidth of 44.8 GB/s . That easily dwarfs the 6.4 GB/s of the McKinley system bus, a resource shared with other processors in most system configurations.
Intel could have reproduced the on-chip memory controllers and interprocessor communications link interfaces of the EV7 with relative ease. What are much harder to quickly duplicate are the routing engines that drive them. It is these on-chip routers that support cache coherency in systems ranging from 2 to potentially hundreds of processors. The Alpha group was first to tackle this problem in a concerted fashion, and spent years to perfect and verify the logic and algorithms that go into the EV7 switching fabric and directory system. The difficulty in designing, verifying and debugging cache coherency schemes for large-scale systems is legendary. It is certainly one of the hardest problems in computer design and often the cause of any large-scale schedule slips in processors, chipsets and systems.
The big payoff of the EV7 scheme is the removal of chipsets as a bottleneck to high performance at both the processor and system level. In particular, the EV7 will enjoy a huge bonus in latency reduction over its competitors. A large scale EV7 multiprocessor system will likely perform a remote memory access (hopping from EV7 to EV7 until the processor connected to the desired memory region is reached) with latency similar to what today’s small scale systems require for a local memory access. Once Intel is able to incorporate this technology into future IA64 systems it will be in a very strong position to take on IBM and Sun for even the largest scale commercial and scientific computing applications.
Discuss (16 comments)