By: Jonathan Kang (johnbk.delete@this.gmail.com), October 1, 2007 1:34 pm
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 10/1/07 wrote:
>Do cache coherency protocols do critical word first? It seems like you would probably cut out a good bit of latency...
A good bit but not all or nearly enough I'd say. Even if the arrival of the first word (4-bytes) got to the CPU that needs it for processing in the minimal time (assuming 1 link active, that's 40+ cycles still of idle waiting for one operation, assuming only one memory operand is needed), unless the CPU is kept busy for another 32 cycles for the next word to arrive (assuming the next word is also the word to be used next), it will have to stall again, and then again, etc. The number of idle cycles will add up between each word fetch.
The ultimate bandwidth of the link in low-latency mode, low-power mode (1 link active) is simply too low to transfer cache information fast enough for the CPU to process it.
Ultimately, it's a matter of being able to go from low (or no) bandwidth to full bandwidth in the shortest amount of time (in CSI's case, one clock cycle).
>Do cache coherency protocols do critical word first? It seems like you would probably cut out a good bit of latency...
A good bit but not all or nearly enough I'd say. Even if the arrival of the first word (4-bytes) got to the CPU that needs it for processing in the minimal time (assuming 1 link active, that's 40+ cycles still of idle waiting for one operation, assuming only one memory operand is needed), unless the CPU is kept busy for another 32 cycles for the next word to arrive (assuming the next word is also the word to be used next), it will have to stall again, and then again, etc. The number of idle cycles will add up between each word fetch.
The ultimate bandwidth of the link in low-latency mode, low-power mode (1 link active) is simply too low to transfer cache information fast enough for the CPU to process it.
Ultimately, it's a matter of being able to go from low (or no) bandwidth to full bandwidth in the shortest amount of time (in CSI's case, one clock cycle).