By: Jonathan Kang (johnbk.delete@this.gmail.com), October 1, 2007 8:07 am
Room: Moderated Discussions
rwessel (robertwessel@yahoo.com) on 9/28/07 wrote:
---------------------------
>Jonathan Kang (johnbk@gmail.com) on 9/28/07 wrote:
>---------------------------
>>The problem is knowing which links to shut down. The non-deterministic nature
>>of cache-coherency means that you will almost always get times when a link has to
>>go from idle (one link on) to active (all links on) in order to transmit data.
>>If it doesn't, a CPU stalls for hundreds or thousands of its own clock cycles waiting.
>
>
>There's no reason you can't transmit data over the single link, just because it
>transmitted more or less in parallel when all the links are up. Yes it will be
>a factor slower, but certainly less than the assumed hundreds of cycles. Obviously
>if you suddenly need to flush the whole cache, you'll be eating that startup time,
>but for a few lines here and there, you'll get a few extra cycles each, not hundreds.
Well, assuming a 128 byte cacheline (Netburst and I think Core too), a single bit-lane will take roughly 1280 clock cycles to transfer (2.5 GHz I think in the case of PCIe). A 16-lane would take 80 clock cycles. Assuming equivalent processor clockspeed, that's a stall time difference of under 100 to over a thousand. The only way to alleviate this is to either have multiple lanes (not just one) active even during idle or to increase single-lane frequency. Both would increase power consumption either linearly (in the best case) or exponentially (in the worst).
The equivalent in a CSI cache update, with 16 bit-lanes waking up within a clock cycle, would be some multitude of eighty 2.5 GHz cycle times (depending on the clock speed of CSI, which I'm guessing will be somewhat less than 2.5 GHz). It will never be as bad as in the thousands though.
---------------------------
>Jonathan Kang (johnbk@gmail.com) on 9/28/07 wrote:
>---------------------------
>>The problem is knowing which links to shut down. The non-deterministic nature
>>of cache-coherency means that you will almost always get times when a link has to
>>go from idle (one link on) to active (all links on) in order to transmit data.
>>If it doesn't, a CPU stalls for hundreds or thousands of its own clock cycles waiting.
>
>
>There's no reason you can't transmit data over the single link, just because it
>transmitted more or less in parallel when all the links are up. Yes it will be
>a factor slower, but certainly less than the assumed hundreds of cycles. Obviously
>if you suddenly need to flush the whole cache, you'll be eating that startup time,
>but for a few lines here and there, you'll get a few extra cycles each, not hundreds.
Well, assuming a 128 byte cacheline (Netburst and I think Core too), a single bit-lane will take roughly 1280 clock cycles to transfer (2.5 GHz I think in the case of PCIe). A 16-lane would take 80 clock cycles. Assuming equivalent processor clockspeed, that's a stall time difference of under 100 to over a thousand. The only way to alleviate this is to either have multiple lanes (not just one) active even during idle or to increase single-lane frequency. Both would increase power consumption either linearly (in the best case) or exponentially (in the worst).
The equivalent in a CSI cache update, with 16 bit-lanes waking up within a clock cycle, would be some multitude of eighty 2.5 GHz cycle times (depending on the clock speed of CSI, which I'm guessing will be somewhat less than 2.5 GHz). It will never be as bad as in the thousands though.