Load Store Unit
The Load Store unit is a privately addressed, non-coherent address space for the SPE. Data is moved in and out of the Load Store unit in 128 Byte lines by the DMA engine. Due to the fact that the LS must simultaneously support DMA transfers into the SPE, DMA transfers out of the SPE as well as local accesses by the execution units, IBM expects that the LS unit would have a utilization rate as high as 80~90% when the SPE is running optimally. As a result, the DMA engine must schedule data transfers to avoid contentions on the system bus and LS. While the use of the software controlled data movement mechanism and the lack of a cache increases the difficulty of programming the SPE, the explicit software management aspect of the SPE means that it is well suited to support real time applications.
Figure 5 – Software scheduled threads overlapping computation and data streaming
In the CELL processor, the software manages the DMA and reserves channels to move data to and from the LS. The DMA is programmed and resources allocated for the movement of data in response to requests. The request queue in the SPE supports up to 16 outstanding requests. Each request can transfer up to 16 Kb of data. Once the data is moved into the LS, the SPE then performs the computation by accessing the private LS in isolation. Ideally, each SPE would overlap computation with data streaming, and two or more software managed threads can operate concurrently on a SPE at a given instance in time. In such a scenario, while one thread is moving data in and out of the LS via the DMA engine, a second thread can occupy the computing resources of the SPE. Figure 5 illustrates the basic idea of using software managed threads to explicitly overlap computation and data movement.
Discuss (6 comments)