Glimpses into the Programming Model
Figure 2 – Overview of the software architecture presented by Seiji Maeda of Toshiba
At COOL Chips VIII, both Sony and Toshiba presented glimpses into the programming model for the CELL processor. Arnd Bergman, the maintainer of the Linux kernel on the CELL processor, further revealed details of the programming model, based on his experience. Figure 2 shows an overview of the software architecture as presented by Maeda-san.
In his keynote presentations at COOL Chips, Masakazu Suzuoki from Sony began his introduction to the programming model of the CELL processor by first explaining the rationale behind the organization and design of the CELL processor. Suzuoki-san argued that object oriented programming is the desired programming model of the future, therefore hardware should be explicitly designed to accommodate the nature of the object oriented programming model. Specifically, Suzuoki-san described the optimal hardware to support object oriented programming as a processor with multiple processing elements, supporting one object per processing element and using dedicated local memory to store program text and private data. Moreover, Suzuoki-san argued that the hardware should support object memory protection by limiting the ability of objects to access other local memory and guaranteeing privacy for objects.
After the justification for the object oriented programming model given by Suzuoki-san, Seiji Maeda from Toshiba described some possible software architectures for the CELL processor in his presentation at COOL Chips. In essence, the asymmetric nature of the CELL processor means that two separate tool chains are needed to create an application for the CELL processor. Programmers coding for the CELL processor need to think in terms of software modules and separate tool chains are needed to deal with PPE modules and SPE modules. As described previously, the function of the PPE is to act as the host processor and perform real time resource scheduling for the SPEs. To implement those functionalities, PPE modules must be written to perform generic processing tasks and I/O handling. Then, to fully utilize the power of the CELL processor, programmers must focus their attention on the creation of SPE modules. Each SPE module should use multiple SPE threads to take advantage of the parallelism afforded by the multiple SPE’s. To simplify the task of scheduling, all SPE threads in an SPE module are always scheduled simultaneously. Furthermore, SPE threads within an SPE module are started and stopped at the same time to reduce the complexity of synchronization. However, the complexity of scheduling remains and a PPE module must handle the scheduling of the SPE’s on a module-by-module basis. Each SPE module contains scheduling information, such as the number of SPE threads and precedent constraints between the SPE modules to ensure program correctness. The PPE scheduling module must guarantee that the real-time constraints of the SPE module can be met before new SPE modules are accepted for execution. Lastly, the SPE’s are virtualized by the resource scheduler so that the same application code can execute on different versions of the CELL processor with different numbers of SPE’s. In this sense, the SPE modules are akin to the bundles in IA64, and the SPE threads are akin to the individual instructions in the bundle. The grouping of SPE threads within an SPE module means that those threads have been declared to be explicitly parallel, but the actual execution of those parallel SPE threads depends on the availability of SPE units. Ideally, this programming model allows the same program to execute on a CELL processor with 8 SPE’s, 16 SPE’s, 7 SPE’s or even just 1 SPE. The difference would be that a 1 SPE CELL processor would likely drop many SPE modules, due to its inability to meet the real-time scheduling requirements, while the difference between the 7 SPE version and the 8 SPE version would be negligible.