By: Travis Downs (, February 28, 2019 12:27 pm
If you've ever tried to use some of the L2_RQSTS events to profile L2 cache events, you may have run into various weird results such as "hits" and "misses" not adding up to all requests, or numbers that don't make sense based on how you expect your process to act. Certainly I've seen weird things while looking at that other weird thing.

As it turns out L2_RQSTS, at least in Skylake and derivatives (and probably in Haswell through Broadwell), actually seems to be implemented in a sane way in the hardware: it's just the documentation that's bad, probably because the event was encoded differently prior to Haswell and the documentation and event names never caught up or were anchored to the old concept.

The current Haswell+ incarnation distinguishes between the origin/type of the L2 request and the result (i.e., hit and miss). For hits, it also distinguishes between hits on lines in E/S state and M state, which is something new and which didn't get into the documentation, making some events practically useless (many events in the SDM omit M-state hits which doesn't make any sense so you will be randomly missing a huge chunk of events if you use any of the "hit" counters). For sources, you can even distinguish between stuff originating in the L1 vs the L2 prefetchers: making one of the few counters that lets you monitor the L1 prefetchers at all.

Unfortunately, the bad doc also infected the listed events and so you suffer from it if you use them through perf/PAPI whatever. Still you can just use the raw event syntax to use the full events.

More detailed writeup here.
