anon ( on February 4, 2015 11:27 pm wrote:
> Silvermont does not do memory disambiguation, despite Intel having perhaps more experience than
> anyone with it. Although it is their first gen core, so it's possible they decided to limited
> some complexities. It does suggest that it's not the lowest hanging of fruit, though.

Like I said, it makes more sense when you go wider. Silvermont is pretty narrow.

> L1I L1D L2
> A15 (Exynos 5250) 32K,2-way 32K,2-way,4-cycles 1M,16-way,21-cycles
> A57 48K,3-way 32K,2-way,5-cycles(?) 512K-2M,16-way,??-cycles
> Denver 128K,4-way 64K,4-way,?-cycles 2M,16-way,18-cycles
> A8 64K,?-way 64K,?-way,4-cycles 1M,??-way,??-cycles + 4M L3
> Silvermont 32K,8-way 24K,6-way,3-cycles 1M,16-way,14-cycles
> I include the A15 only to see the L2 latency at 1MB. A57 may be
> a little better, but I'm not sure. Anybody has the numbers?
> So of the modern 64-bit mobile CPUs, I would say A57 has the worst cache hierarchy.
> Unless the L2 is faster than Silvermont's at 1MB (because its L1 is significantly
> worse). But I doubt it is, I would say the L2 is closer to 20 cycles than to 10.

Where did you get 5 cycles for A57 L1 dcache, or 18 cycles for Denver L2? I can't guess why A57 would increase latency for a dcache that's otherwise the same. You have to be careful with L2 supplied numbers, they may or may not include the L1 miss cost.

Anandtech did some latency tests on A8 here:

If these numbers are comparable and trustworthy it looks like about 17-18 cycles which isn't that great for 1.5GHz peak clock. I think only Intel really looks like they're far ahead.

If you want to see something worse there's Krait 300, with 3 cycle L0 (4KB direct mapped), 6 cycle L1 (16KB 4-way) and an asynchronous L2 12 cycle + 14ns, for a whopping 36 cycles at only 1.5GHz. Although Krait 400 is supposed to have reduced L2 latency, at least (according to AT that was the only change they made)

