By: Jonathan Kang (johnbk.delete@this.gmail.com), September 28, 2007 4:38 am
Room: Moderated Discussions
rwessel (robertwessel@yahoo.com) on 9/27/07 wrote:
---------------------------
>Jonathan Kang (johnbk@gmail.com) on 9/27/07 wrote:
>---------------------------
>>That's a good point. Yes, in the situation of multiple, parallel links, all but
>>one can be turned off. But for schemes such as cache coherency, bursts of large
>>(or small) chunks of data would need to be transmitted as fast as possible.
>>
>>For instance, if the link between 3 MPU's are in the idle state (only 1 link active)
>>and a snoop is made. If the response is that the cache data must be updated, then
>>that cache data must be transmitted over the link as quickly as possible to prevent
>>stalls. This isn't possible unless the link can go between idle (low bandwidth)
>>and active (full bandwidth) within a matter of a few cycle times.
>
>
>That's true, but that largely going to be a self limiting problem - if the CPU
>is idle, it may be a bit slow in responding to actual cache line updates/writebacks/whatever
>(just the coherency checks themselves should have plenty of bandwidth with a single
>link), and there will be a limited number of times a cache line will need to be transferred off the idle CPU.
AMD fixed the problem of the CPU having to wake-up for a cache snoop in Barcelona I think. I suspect Intel will do the same thing in Nehalem. So the cache can respond near-instantaneously to snoop and update requests.
The problem is that even though cache updates over the links happen with relatively low frequency, it's like any other cache-miss access, if it ever happens, it could potentially stall the CPU requesting the data for a (relative to the CPU) long time.
>Obviously if the CPU has the memory controller, there's going to be a lot of real
>traffic, even if the CPU is idle, but even in that case some of the links can probably
>be shut down with minimal impact on the system, since the traffic from the CPU is simply not going to be there.
The problem is knowing which links to shut down. The non-deterministic nature of cache-coherency means that you will almost always get times when a link has to go from idle (one link on) to active (all links on) in order to transmit data. If it doesn't, a CPU stalls for hundreds or thousands of its own clock cycles waiting.
---------------------------
>Jonathan Kang (johnbk@gmail.com) on 9/27/07 wrote:
>---------------------------
>>That's a good point. Yes, in the situation of multiple, parallel links, all but
>>one can be turned off. But for schemes such as cache coherency, bursts of large
>>(or small) chunks of data would need to be transmitted as fast as possible.
>>
>>For instance, if the link between 3 MPU's are in the idle state (only 1 link active)
>>and a snoop is made. If the response is that the cache data must be updated, then
>>that cache data must be transmitted over the link as quickly as possible to prevent
>>stalls. This isn't possible unless the link can go between idle (low bandwidth)
>>and active (full bandwidth) within a matter of a few cycle times.
>
>
>That's true, but that largely going to be a self limiting problem - if the CPU
>is idle, it may be a bit slow in responding to actual cache line updates/writebacks/whatever
>(just the coherency checks themselves should have plenty of bandwidth with a single
>link), and there will be a limited number of times a cache line will need to be transferred off the idle CPU.
AMD fixed the problem of the CPU having to wake-up for a cache snoop in Barcelona I think. I suspect Intel will do the same thing in Nehalem. So the cache can respond near-instantaneously to snoop and update requests.
The problem is that even though cache updates over the links happen with relatively low frequency, it's like any other cache-miss access, if it ever happens, it could potentially stall the CPU requesting the data for a (relative to the CPU) long time.
>Obviously if the CPU has the memory controller, there's going to be a lot of real
>traffic, even if the CPU is idle, but even in that case some of the links can probably
>be shut down with minimal impact on the system, since the traffic from the CPU is simply not going to be there.
The problem is knowing which links to shut down. The non-deterministic nature of cache-coherency means that you will almost always get times when a link has to go from idle (one link on) to active (all links on) in order to transmit data. If it doesn't, a CPU stalls for hundreds or thousands of its own clock cycles waiting.