By: Michael S (already5chosen.delete@this.yahoo.com), August 30, 2007 2:17 am
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 8/29/07 wrote:
---------------------------
>Peter Gerdes (truepath@infiniteinjury.org) on 8/29/07 wrote:
>---------------------------
>>Thanks for the great article. I've been looking for this >sort of info about CSI for awhile.
>>
>>So I'm a bit unclear about how intel's M state compares >with the O state from AMD.
>
>>The difference is supposedly that when a copy of a cache >line in state M is requested
>>the cache line has to be written to memory but what does >this really mean when
>you have integrated memory >controllers?
>
>So that's not entirely true. The M state cache line can be sent from processor
>to processor without any memory write back. For example, say we have MPU0 and MPU1,
>with a cache line in the M state in MPU0. If MPU1 requests the cache line with
>a write hint, then MPU0 just sends the cache line without a write back, and it ends up in MPU1's cache in the M state.
>
>The only issue is when you want two or more processors to be able to read the modified
>line, then you need to write back for Intel's protocol.
>
>>If a processor has a modified cache line for memory it is >controlling isn't this
>>effectively equivalent to being in the O state?
>
>No - the O state implies that someone else in the system is sharing the cache line and has it in the S state.
>
>>Presumably that processor is answering
>>all requests about the memory it controls and it would be >silly for it to physically
>>interrogate the memory when it has the current memory state >on hand.
>
>>Now maybe
>>there is something I'm missing but it seems the primary difference between having
>>an O state and not is whether a modified cache line can be stored on any processor
>>or is only held on the processor owning that address.
>
>I think you misunderstood how the O state works. The O state has little to do
>with the memory owning the cache line. Here's a good example:
>
>CPU0...3
>Cache line is owned by CPU0
>
>Tx1: CPU1 requests line with write hint
Are you sure that ccHT implements write hint?
>Tx2: CPU0 sends line in E state to CPU1
>Tx3: CPU1 writes to line, switches to M state
Seem like at this point the system state is Illegal. M at CPU1 is not compatible with E at CPU0.
IMO, it should be:
Tx2: CPU0 sends line to CPU1 and [upon ACK, if implemented] switches to I state
>Tx4: CPU3 requests line for reading
>Tx5: CPU0 switches to O state and sends line in S state to CPU3
>
>>Anyway my speculation as to why intel didn't implement the >O state is that they
>>expect the cost of invalidating a read to be higher than >the cost of doing an extra
>>write. Certainly this seems plausible if the processors >optimistically start using
>>the first response they receive and must throw out that >work if another proc invalidates
>>the read.
>
>I think the reason they did it is that relatively little data is every written
>and then shared out to many processors...
>
>Most shared data is probably stuff in the instruction stream and hence never written.
>
>>In fact with a NUMA aware OS it seems extremely likely that either the
>>proc with the modified cache line or the one trying to read it control the corresponding
>>memory, in which case no extra writes need to be done.
>>
>>I feel I must be missing something here since it would be strange for AMD to implement
>>the O optimization if that is primarily useful on systems with a clogged shared
>>bus. Is it an issue related to cache size?
>
>No, I think you just misunderstood how the O state works.
>
>DK
---------------------------
>Peter Gerdes (truepath@infiniteinjury.org) on 8/29/07 wrote:
>---------------------------
>>Thanks for the great article. I've been looking for this >sort of info about CSI for awhile.
>>
>>So I'm a bit unclear about how intel's M state compares >with the O state from AMD.
>
>>The difference is supposedly that when a copy of a cache >line in state M is requested
>>the cache line has to be written to memory but what does >this really mean when
>you have integrated memory >controllers?
>
>So that's not entirely true. The M state cache line can be sent from processor
>to processor without any memory write back. For example, say we have MPU0 and MPU1,
>with a cache line in the M state in MPU0. If MPU1 requests the cache line with
>a write hint, then MPU0 just sends the cache line without a write back, and it ends up in MPU1's cache in the M state.
>
>The only issue is when you want two or more processors to be able to read the modified
>line, then you need to write back for Intel's protocol.
>
>>If a processor has a modified cache line for memory it is >controlling isn't this
>>effectively equivalent to being in the O state?
>
>No - the O state implies that someone else in the system is sharing the cache line and has it in the S state.
>
>>Presumably that processor is answering
>>all requests about the memory it controls and it would be >silly for it to physically
>>interrogate the memory when it has the current memory state >on hand.
>
>>Now maybe
>>there is something I'm missing but it seems the primary difference between having
>>an O state and not is whether a modified cache line can be stored on any processor
>>or is only held on the processor owning that address.
>
>I think you misunderstood how the O state works. The O state has little to do
>with the memory owning the cache line. Here's a good example:
>
>CPU0...3
>Cache line is owned by CPU0
>
>Tx1: CPU1 requests line with write hint
Are you sure that ccHT implements write hint?
>Tx2: CPU0 sends line in E state to CPU1
>Tx3: CPU1 writes to line, switches to M state
Seem like at this point the system state is Illegal. M at CPU1 is not compatible with E at CPU0.
IMO, it should be:
Tx2: CPU0 sends line to CPU1 and [upon ACK, if implemented] switches to I state
>Tx4: CPU3 requests line for reading
>Tx5: CPU0 switches to O state and sends line in S state to CPU3
>
>>Anyway my speculation as to why intel didn't implement the >O state is that they
>>expect the cost of invalidating a read to be higher than >the cost of doing an extra
>>write. Certainly this seems plausible if the processors >optimistically start using
>>the first response they receive and must throw out that >work if another proc invalidates
>>the read.
>
>I think the reason they did it is that relatively little data is every written
>and then shared out to many processors...
>
>Most shared data is probably stuff in the instruction stream and hence never written.
>
>>In fact with a NUMA aware OS it seems extremely likely that either the
>>proc with the modified cache line or the one trying to read it control the corresponding
>>memory, in which case no extra writes need to be done.
>>
>>I feel I must be missing something here since it would be strange for AMD to implement
>>the O optimization if that is primarily useful on systems with a clogged shared
>>bus. Is it an issue related to cache size?
>
>No, I think you just misunderstood how the O state works.
>
>DK