Mesochronous Interfaces and Clock Distribution
One of the more novel aspects of this project is that the interfaces between the routers are mesochronous. Typically, most signals in a chip are synchronous, running at the same frequency with no phase difference. Two signals are mesochronous if they run at the same frequency, but the phase of the signal may vary. Plesiochronous signals have both different frequencies and different phase alignment. Figure 6 below shows two signals that are (a) synchronous (b) mesochronous and (c) plesiochronous.
Figure 6 – Various Types of Signal Synchronization
The mesochronous clocking for different tiles is extremely advantageous for clock distribution. Since the clock signals don’t need to be as precisely coordinated, MPU designers can implement simpler, and less power hungry clock distribution networks, replacing a complicated H-tree with something simpler and shorter like a grid. This also means that many repeaters and buffers, which are used to keep signals in phase, can be removed, reducing power draw and thermal dissipation. For example, the 180nm Itanium2 microprocessor used a balanced H-tree and burned around 30% of the 130W power envelope . The 90nm Itanium2 uses 25W of a 100W power budget on clock distribution . This sort of power consumption is fairly typical for a high performance microprocessor; usually one quarter to one third of the power goes into the clock tree, when clock gating is used. Without clock gating, that can shoot up as high as 70% .
In comparison to high performance microprocessors, clock distribution in Polaris was both a simple and low power affair. The clock is distributed from the PLL by a grid across horizontal spines on M8, and vertical spines on M7 to the individual tiles. While the presentation did not give exact power numbers, the estimated global clock distribution (across M7 and M8) uses 2.2W. Within the individual tiles, roughly 10% of the power is used for clocking, and 33% of the communication power is used for clocking, plus 6% for the mesochronous interfaces. Given that the vast majority of clock distribution power is for driving the final stage, which is inside each tile, this implies a substantial improvement. These estimates are at 4GHz and 1.2V, with a total of 181W dissipation for the entire chip. Altogether, it seems likely that the clock distribution in Polaris is around 12-20% of the total power, although hopefully future disclosures will be more precise.
While mesochronous clocking saves power in clock distribution, it complicates the network design. Since there may be phase variation between any two tiles in the system, the mesochronous network interfaces must tolerate and correct any phase mismatches. The interfaces are implemented using a 4 deep circular FIFO; as data is received it is synchronized to the next tile by a programmable delay strobe. This synchronization normally does not have any latency impact, but sometimes there can be a 1 cycle delay. The worst case phase misalignment results in a 2 cycle delay, but this is very infrequent.