Network and Router Design
The interesting part of the Teraflops research project is not the processing elements, but the routers, the mesochronous clocking and network. The network topology is a 2D mesh. Each tile contains a 5 port router that connects to the adjacent tiles in all four cardinal directions, and the processing element itself. Each network packet is broken down into multiple FLow control units (FLITs); the minimum packet size is 2 FLITs, and there is no maximum. The FLIT header has 3 bits to indicate the destination, and multiple headers may be chained together for long paths.
Figure 4 – Router Design
The router features five 36 bit ports, with a 5 stage pipeline and two virtual lanes to avoid deadlock, as shown above in Figure 4. At 4GHz, each router provides 80GB/s of communication bandwidth. Each lane has a FIFO buffer that can hold up to 16 FLITs. The first router design, which was shown at ISSCC 2001, had a crossbar dedicated to each lane. Instead this new design shares a single non-blocking crossbar for both lanes that is double pumped in the fourth pipestage using dual edge-triggered flip-flops. The dual edge-triggered flip-flops let the crossbar transfer data at both the rising and falling edge of the clock signal, similar to the way that the input and output on DDR memory works. This improvement reduces the crossbar area by 50%, the overall router area by 36%, improves average power by 13% and decreases latency by one cycle versus a prior design. Figure 5 below shows the design for the newer crossbar on top, and an area comparison between the two different designs at the bottom. The micrograph on the left is the new design, with a shared crossbar; the micrograph on the right is a scaled down 65nm version of the original dual crossbar design.
Figure 5 – Double Pumped Crossbar Switch Design
The on-chip network is wormhole switched. The first FLIT is launched by the sender, and the receiver will inspect the header, and then forward the first FLIT to the appropriate port. Any subsequent FLITs will be sent in the same path. This has the advantage of pipelining the message transmission. However, if there is a delay, the message will begin to stall. When a FLIT cannot be sent, it is stored in the 16 entry FIFO, and when a FIFO is full, the receiving router sends a message to the sender to stop. This process eventually creates back pressure on the original sender. Backpressure is a relatively simple technique for flow control, and is less efficient than credit based mechanisms. Backpressure and wormhole routing were used because they were simple and low-risk design choices, which enabled the team to spend their time on other more innovative portions of the project. For a real product, a much more sophisticated network would be implemented.