By: David Kanter (dkanter.delete@this.realworldtech.com), August 29, 2007 6:29 pm
Room: Moderated Discussions
Peter Gerdes (truepath@infiniteinjury.org) on 8/29/07 wrote:
---------------------------
>Thanks for the great article. I've been looking for this >sort of info about CSI for awhile.
>
>So I'm a bit unclear about how intel's M state compares >with the O state from AMD.
>The difference is supposedly that when a copy of a cache >line in state M is requested
>the cache line has to be written to memory but what does >this really mean when you have integrated memory >controllers?
So that's not entirely true. The M state cache line can be sent from processor to processor without any memory write back. For example, say we have MPU0 and MPU1, with a cache line in the M state in MPU0. If MPU1 requests the cache line with a write hint, then MPU0 just sends the cache line without a write back, and it ends up in MPU1's cache in the M state.
The only issue is when you want two or more processors to be able to read the modified line, then you need to write back for Intel's protocol.
>If a processor has a modified cache line for memory it is >controlling isn't this
>effectively equivalent to being in the O state?
No - the O state implies that someone else in the system is sharing the cache line and has it in the S state.
>Presumably that processor is answering
>all requests about the memory it controls and it would be >silly for it to physically
>interrogate the memory when it has the current memory state >on hand.
>Now maybe
>there is something I'm missing but it seems the primary difference between having
>an O state and not is whether a modified cache line can be stored on any processor
>or is only held on the processor owning that address.
I think you misunderstood how the O state works. The O state has little to do with the memory owning the cache line. Here's a good example:
CPU0...3
Cache line is owned by CPU0
Tx1: CPU1 requests line with write hint
Tx2: CPU0 sends line in E state to CPU1
Tx3: CPU1 writes to line, switches to M state
Tx4: CPU3 requests line for reading
Tx5: CPU0 switches to O state and sends line in S state to CPU3
>Anyway my speculation as to why intel didn't implement the >O state is that they
>expect the cost of invalidating a read to be higher than >the cost of doing an extra
>write. Certainly this seems plausible if the processors >optimistically start using
>the first response they receive and must throw out that >work if another proc invalidates
>the read.
I think the reason they did it is that relatively little data is every written and then shared out to many processors...
Most shared data is probably stuff in the instruction stream and hence never written.
>In fact with a NUMA aware OS it seems extremely likely that either the
>proc with the modified cache line or the one trying to read it control the corresponding
>memory, in which case no extra writes need to be done.
>
>I feel I must be missing something here since it would be strange for AMD to implement
>the O optimization if that is primarily useful on systems with a clogged shared
>bus. Is it an issue related to cache size?
No, I think you just misunderstood how the O state works.
DK
---------------------------
>Thanks for the great article. I've been looking for this >sort of info about CSI for awhile.
>
>So I'm a bit unclear about how intel's M state compares >with the O state from AMD.
>The difference is supposedly that when a copy of a cache >line in state M is requested
>the cache line has to be written to memory but what does >this really mean when you have integrated memory >controllers?
So that's not entirely true. The M state cache line can be sent from processor to processor without any memory write back. For example, say we have MPU0 and MPU1, with a cache line in the M state in MPU0. If MPU1 requests the cache line with a write hint, then MPU0 just sends the cache line without a write back, and it ends up in MPU1's cache in the M state.
The only issue is when you want two or more processors to be able to read the modified line, then you need to write back for Intel's protocol.
>If a processor has a modified cache line for memory it is >controlling isn't this
>effectively equivalent to being in the O state?
No - the O state implies that someone else in the system is sharing the cache line and has it in the S state.
>Presumably that processor is answering
>all requests about the memory it controls and it would be >silly for it to physically
>interrogate the memory when it has the current memory state >on hand.
>Now maybe
>there is something I'm missing but it seems the primary difference between having
>an O state and not is whether a modified cache line can be stored on any processor
>or is only held on the processor owning that address.
I think you misunderstood how the O state works. The O state has little to do with the memory owning the cache line. Here's a good example:
CPU0...3
Cache line is owned by CPU0
Tx1: CPU1 requests line with write hint
Tx2: CPU0 sends line in E state to CPU1
Tx3: CPU1 writes to line, switches to M state
Tx4: CPU3 requests line for reading
Tx5: CPU0 switches to O state and sends line in S state to CPU3
>Anyway my speculation as to why intel didn't implement the >O state is that they
>expect the cost of invalidating a read to be higher than >the cost of doing an extra
>write. Certainly this seems plausible if the processors >optimistically start using
>the first response they receive and must throw out that >work if another proc invalidates
>the read.
I think the reason they did it is that relatively little data is every written and then shared out to many processors...
Most shared data is probably stuff in the instruction stream and hence never written.
>In fact with a NUMA aware OS it seems extremely likely that either the
>proc with the modified cache line or the one trying to read it control the corresponding
>memory, in which case no extra writes need to be done.
>
>I feel I must be missing something here since it would be strange for AMD to implement
>the O optimization if that is primarily useful on systems with a clogged shared
>bus. Is it an issue related to cache size?
No, I think you just misunderstood how the O state works.
DK