Logic Depth, Circuit Design, Die Size and Process Shrink
Figure 2 – Per stage circuit delay depth of 11 FO4 often left only 5~8 FO4 for logic flow
The first incarnation of the CELL processor is implemented in a 90nm SOI process. IBM claims that while the logic complexity of each pipeline stage is roughly comparable to other processors with a per stage logic depth of 20 FO4, aggressive circuit design, efficient layout and logic simplification enabled the circuit designers of the CELL processor to reduced the per stage circuit delay to 11 FO4 throughout the entire design. The design methodology deployed for the CELL processor project provides an interesting contrast to that of other IBM processor projects in that the first incarnation of the CELL processor makes use of fully custom design. Moreover, the full custom design includes the use of dynamic logic circuits in critical data paths. In the first implementation of the CELL processor, dynamic logic was deployed for both area minimization as well as performance enhancement to reach the aggressive goal of 11 FO4 circuit delay per stage. Figure 2 shows that with the circuit delay depth of 11 FO4, oftentimes only 5~8 FO4 are left for inter-latch logic flow.
The use of dynamic logic presents itself as an interesting issue in that dynamic logic circuits rely on the capability of logic transistors to retain a capacitive load as temporary storage. The decreasing capacitance and increasing leakage of each successive process generation means that dynamic logic design becomes more challenging with each successive process generation. In addition, dynamic circuits are reportedly even more challenging on SOI based process technologies. However, circuit design engineers from IBM believe that the use of dynamic logic will not present itself as an issue in the scalability of the CELL processor down to 65 nm and below. The argument was put forth that since the CELL processor is a full custom design, the task of process porting with dynamic circuits is no more and no less challenging than the task of process porting on a design without dynamic circuits. That is, since the full custom design requires the re-examination and re-optimization of transistor and circuit characteristics for each process generation, if a given set of dynamic logic circuits become impractical for specific functions at a given process node, that set of circuits can be replaced with static circuits as needed.
The process portability of the CELL processor design is an interesting topic due to the fact that the prototype CELL processor is a large device that occupies 221 mm2 of silicon area on the 90 nm process. Comparatively, the IBM PPC970FX processor has a die size of 62 mm2 on the 90 nm process. The natural question then arises as to whether Sony will choose to reduce the number of SPE’s to 4 for the version of the CELL processor to appear in the next generation Playstation, or keep the 8 SPE’s and wait for the 65 nm process before it ramps up the production of the next generation Playstation. Although no announcements or hints have been given, IBM’s belief in regards to the process portability of the CELL processor design does bode well for the 8 SPE path since process shrinks can be relied on to bring down the cost of the CELL processor at the 65 nm node and further at the 45 nm node.
Discuss (6 comments)