By: Travis Downs (travis.downs.delete@this.gmail.com), February 22, 2019 3:47 pm
Consider a single-socket system with private, NINE L1 and L2 caches, and a shared inclusive L3, where the L3 acts as the primary coherence arbiter for the system.

Now, a core makes a read access to a line not present in any cache. In textbook coherency systems like MESI, MOESI, MESIF and variants, this line arrives in the cache in E state, right?

If some time later (with no intervening access to the line by another core) the core makes a write access to this same line, it needs to upgrade the line to M. Does it have to go all the way back to L3 to do this?

I think the textbook model says "yes", but it seems wasteful. That is, why should the L3 care about the exact transition between the E and M state? Could that not be a private concern within the private caches of each core? For example, regardless of E or M state, a request for the line from a different core will need to invalidate the line on the core that holds it (I think), and the L3 has to wait for that to happen, so couldn't it find out then whether the line has been modified or not?

Similarly if you go through the other transitions, it seems like the L3 could this "lazy M" behavior where E means "exclusively held, possibly modified on another core, I'll find out when it matters".

I'm interested in what modern systems actually do here.

I'm also interested in any resources that actually cover some of the intricacies of MESI and related protocols in the presence of multi-level caches, some private and some shared. Textbooks seem to mostly use a DRAM + single shared cache model, which removes a lot of the complexity.
