By: Peter Gerdes (truepath.delete@this.infiniteinjury.org), August 30, 2007 7:03 pm
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 8/30/07 wrote:
---------------------------
>Yes. A write back to memory takes a long time relative to snooping the tags on
>your L1/L2 cache. You also waste memory bandwidth.
Apparently I'm either being unclear or really not understanding something so please be patient and let me see if I can communicate what I was trying to say.
Now my understanding of where this cache coherency protocol would be used is as follows: We have a multiprocessor system that, like AMD's opteron, uses a NUMA memory model plus a cache coherency protocol that lets it appear to be a UMA. Thus each chip has an exclusive connection to it's own memory pool unseen by any other chip. Now I was assuming that this cache coherency protocol was intended for communication between distinct chips and that cores (which might share a memory controller) would do whatever they liked to stay coherent maybe even using a shared cache.
Now when the cache coherency protocol says that a cache line must be written back to memory it doesn't actually care if the line is 'really' stored in the actual memory bank, only that it APPEARS to be so stored, i.e., the memory controller could implement it's own cache.
Thus presumably a chip that needs to 'write' a cache line to memory it controls doesn't need to send any messages or do anything but remember that this cache line has been 'written' to memory. So long as every read request by another chip on that memory location reflects the modified value everything is hunky dory. Thus since MOST logical writes to memory that the O state would eliminate don't require any PHYSICAL writes to memory it doesn't do much for efficiency.
Supposing the protocol doesn't require sending the same cache line twice to a processor that both controls that memory location and wants to read that cache line it will be a very rare event that the lack of an O state will cause an extra PHYSICAL write to memory. Sure for systems that hang all the memory off of one of the chips this would be a loss but presumably the high performance systems would balance memory between the chips.
Sorry to keep pushing this issue but obviously I am missing something and I'd like to figure out what it is.
---------------------------
>Yes. A write back to memory takes a long time relative to snooping the tags on
>your L1/L2 cache. You also waste memory bandwidth.
Apparently I'm either being unclear or really not understanding something so please be patient and let me see if I can communicate what I was trying to say.
Now my understanding of where this cache coherency protocol would be used is as follows: We have a multiprocessor system that, like AMD's opteron, uses a NUMA memory model plus a cache coherency protocol that lets it appear to be a UMA. Thus each chip has an exclusive connection to it's own memory pool unseen by any other chip. Now I was assuming that this cache coherency protocol was intended for communication between distinct chips and that cores (which might share a memory controller) would do whatever they liked to stay coherent maybe even using a shared cache.
Now when the cache coherency protocol says that a cache line must be written back to memory it doesn't actually care if the line is 'really' stored in the actual memory bank, only that it APPEARS to be so stored, i.e., the memory controller could implement it's own cache.
Thus presumably a chip that needs to 'write' a cache line to memory it controls doesn't need to send any messages or do anything but remember that this cache line has been 'written' to memory. So long as every read request by another chip on that memory location reflects the modified value everything is hunky dory. Thus since MOST logical writes to memory that the O state would eliminate don't require any PHYSICAL writes to memory it doesn't do much for efficiency.
Supposing the protocol doesn't require sending the same cache line twice to a processor that both controls that memory location and wants to read that cache line it will be a very rare event that the lack of an O state will cause an extra PHYSICAL write to memory. Sure for systems that hang all the memory off of one of the chips this would be a loss but presumably the high performance systems would balance memory between the chips.
Sorry to keep pushing this issue but obviously I am missing something and I'd like to figure out what it is.