By: Peter Gerdes (truepath.delete@this.infiniteinjury.org), August 29, 2007 2:11 pm
Room: Moderated Discussions
Thanks for the great article. I've been looking for this sort of info about CSI for awhile.
So I'm a bit unclear about how intel's M state compares with the O state from AMD. The difference is supposedly that when a copy of a cache line in state M is requested the cache line has to be written to memory but what does this really mean when you have integrated memory controllers?
If a processor has a modified cache line for memory it is controlling isn't this effectively equivalent to being in the O state? Presumably that processor is answering all requests about the memory it controls and it would be silly for it to physically interrogate the memory when it has the current memory state on hand. Now maybe there is something I'm missing but it seems the primary difference between having an O state and not is whether a modified cache line can be stored on any processor or is only held on the processor owning that address.
Anyway my speculation as to why intel didn't implement the O state is that they expect the cost of invalidating a read to be higher than the cost of doing an extra write. Certainly this seems plausible if the processors optimistically start using the first response they receive and must throw out that work if another proc invalidates the read. In fact with a NUMA aware OS it seems extremely likely that either the proc with the modified cache line or the one trying to read it control the corresponding memory, in which case no extra writes need to be done.
I feel I must be missing something here since it would be strange for AMD to implement the O optimization if that is primarily useful on systems with a clogged shared bus. Is it an issue related to cache size?
So I'm a bit unclear about how intel's M state compares with the O state from AMD. The difference is supposedly that when a copy of a cache line in state M is requested the cache line has to be written to memory but what does this really mean when you have integrated memory controllers?
If a processor has a modified cache line for memory it is controlling isn't this effectively equivalent to being in the O state? Presumably that processor is answering all requests about the memory it controls and it would be silly for it to physically interrogate the memory when it has the current memory state on hand. Now maybe there is something I'm missing but it seems the primary difference between having an O state and not is whether a modified cache line can be stored on any processor or is only held on the processor owning that address.
Anyway my speculation as to why intel didn't implement the O state is that they expect the cost of invalidating a read to be higher than the cost of doing an extra write. Certainly this seems plausible if the processors optimistically start using the first response they receive and must throw out that work if another proc invalidates the read. In fact with a NUMA aware OS it seems extremely likely that either the proc with the modified cache line or the one trying to read it control the corresponding memory, in which case no extra writes need to be done.
I feel I must be missing something here since it would be strange for AMD to implement the O optimization if that is primarily useful on systems with a clogged shared bus. Is it an issue related to cache size?