By: David Kanter (dkanter.delete@this.realworldtech.com), September 14, 2007 9:44 pm
Room: Moderated Discussions
Peter Gerdes (truepath@infiniteinjury.org) on 8/30/07 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 8/30/07 wrote:
>---------------------------
>
>>Yes. A write back to memory takes a long time relative to >>snooping the tags on
>>your L1/L2 cache. You also waste memory bandwidth.
>
>Apparently I'm either being unclear or really not understanding something so please
>be patient and let me see if I can communicate what I was >trying to say.
>
>Now my understanding of where this cache coherency protocol would be used is as
>follows: We have a multiprocessor system that, like AMD's opteron, uses a NUMA
>memory model plus a cache coherency protocol that lets it >appear to be a UMA.
NUMA isn't a memory model - it's an implementation of the memory hierarchy. Something cannot be NUMA and appear to be UMA.
It's very straight forward - if different regions of memory have different latencies then the system is NUMA. If all memory has the same latency, then the system is UMA. It cannot be both.
>Thus
>each chip has an exclusive connection to it's own memory >pool unseen by any other
>chip.
This makes no sense. In this situation, each memory controller would have to connect to multiple CPUs.
>Now I was assuming that this cache coherency protocol was >intended for communication
>between distinct chips and that cores (which might share a >memory controller) would
>do whatever they liked to stay coherent maybe even using a >shared cache.
So I don't understand what you are imagining any more than I did before.
>Now when the cache coherency protocol says that a cache line must be written back
>to memory it doesn't actually care if the line is 'really' >stored in the actual
>memory bank, only that it APPEARS to be so stored, i.e., >the memory controller could implement it's own cache.
That's not a cache, it's a buffer. But sure, you could buffer the writes - you just need to make sure that if you lose power you don't have any problems.
>Thus presumably a chip that needs to 'write' a cache line >to memory it controls
>doesn't need to send any messages or do anything but >remember that this cache line
>has been 'written' to memory.
Where do you want to store that information? In the memory controller, in the chip, etc.?
>So long as every read request by another chip on
>that memory location reflects the modified value everything >is hunky dory.
Sure. The problem is not the common case though, it's probably in handling exceptional cases.
>Thus
>since MOST logical writes to memory that the O state would >eliminate don't require
>any PHYSICAL writes to memory it doesn't do much for >efficiency.
Um, so write back buffers have to write to memory eventually. You don't eliminate the write, you just defer it in your system.
>Supposing the protocol doesn't require sending the same cache line twice to a processor
>that both controls that memory location and wants to read that cache line it will
>be a very rare event that the lack of an O state will cause an extra PHYSICAL write
>to memory. Sure for systems that hang all the memory off of one of the chips this
>would be a loss but presumably the high performance systems would balance memory between the chips.
I don't understand what you are saying here.
>Sorry to keep pushing this issue but obviously I am missing >something and I'd like to figure out what it is.
I think for starters you are confusing what NUMA and UMA mean, and how they are related to cache coherency.
DK
---------------------------
>David Kanter (dkanter@realworldtech.com) on 8/30/07 wrote:
>---------------------------
>
>>Yes. A write back to memory takes a long time relative to >>snooping the tags on
>>your L1/L2 cache. You also waste memory bandwidth.
>
>Apparently I'm either being unclear or really not understanding something so please
>be patient and let me see if I can communicate what I was >trying to say.
>
>Now my understanding of where this cache coherency protocol would be used is as
>follows: We have a multiprocessor system that, like AMD's opteron, uses a NUMA
>memory model plus a cache coherency protocol that lets it >appear to be a UMA.
NUMA isn't a memory model - it's an implementation of the memory hierarchy. Something cannot be NUMA and appear to be UMA.
It's very straight forward - if different regions of memory have different latencies then the system is NUMA. If all memory has the same latency, then the system is UMA. It cannot be both.
>Thus
>each chip has an exclusive connection to it's own memory >pool unseen by any other
>chip.
This makes no sense. In this situation, each memory controller would have to connect to multiple CPUs.
>Now I was assuming that this cache coherency protocol was >intended for communication
>between distinct chips and that cores (which might share a >memory controller) would
>do whatever they liked to stay coherent maybe even using a >shared cache.
So I don't understand what you are imagining any more than I did before.
>Now when the cache coherency protocol says that a cache line must be written back
>to memory it doesn't actually care if the line is 'really' >stored in the actual
>memory bank, only that it APPEARS to be so stored, i.e., >the memory controller could implement it's own cache.
That's not a cache, it's a buffer. But sure, you could buffer the writes - you just need to make sure that if you lose power you don't have any problems.
>Thus presumably a chip that needs to 'write' a cache line >to memory it controls
>doesn't need to send any messages or do anything but >remember that this cache line
>has been 'written' to memory.
Where do you want to store that information? In the memory controller, in the chip, etc.?
>So long as every read request by another chip on
>that memory location reflects the modified value everything >is hunky dory.
Sure. The problem is not the common case though, it's probably in handling exceptional cases.
>Thus
>since MOST logical writes to memory that the O state would >eliminate don't require
>any PHYSICAL writes to memory it doesn't do much for >efficiency.
Um, so write back buffers have to write to memory eventually. You don't eliminate the write, you just defer it in your system.
>Supposing the protocol doesn't require sending the same cache line twice to a processor
>that both controls that memory location and wants to read that cache line it will
>be a very rare event that the lack of an O state will cause an extra PHYSICAL write
>to memory. Sure for systems that hang all the memory off of one of the chips this
>would be a loss but presumably the high performance systems would balance memory between the chips.
I don't understand what you are saying here.
>Sorry to keep pushing this issue but obviously I am missing >something and I'd like to figure out what it is.
I think for starters you are confusing what NUMA and UMA mean, and how they are related to cache coherency.
DK