Teraflop Chip Overview
The Teraflops chip is manufactured in Intel’s high performance 65nm process with 8 layers of metal and uses 100M transistors on a 275mm2 die. The device integrates 80 tiles (3mm2 each) arranged in an 8×10 2-dimensional mesh. Each tile contains a processing element (PE) and a 5 port router for external communications. The system is designed with 15 FO4 delays per stage and operates from 1-5.6GHz. While the paper submitted ISSCC contained simulation power results, the presentation contained actual measured data. As the various press announcements indicated, Intel was able to achieve 1TFLOP/s on a specific application, with the device operating at 0.95V and 3.16GHz. The on-chip network has a bisection bandwidth of 1.62Tbit/s, when operating at 3.16GHz.
Figure 1 – Teraflops Die Micrograph
Figure 2 below shows the power dissipation versus voltage, accompanied by certain performance points. Note that the application in question is the best case for Intel, with relatively little communication (more on that later). What is most remarkable is that the leakage power is extraordinarily low, both in absolute value and relatively speaking. When all when all tiles are computing roughly 10-15% of the power is leakage, which is very good compared to most high performance MPUs. For instance, Montecito, the 90nm Itanium2 microprocessor on average consumes around 25W leakage power, 25% of the total dissipation .
Figure 2 – Power, Voltage and Performance for Stencil Application
Discuss (14 comments)