By: Patrick Chase (patrickjchase.delete@this.gmail.com), August 21, 2013 12:47 pm
Room: Moderated Discussions
Correcting a mental error here:
Patrick Chase (patrickjchase.delete@this.gmail.com) on August 21, 2013 12:27 pm wrote:
> 3. Their L1 Dcaches are fast. Load->use latency is the same number of cycyles as A15,
> despite clocking up to at up to twice the rate and having higher associativity. IMO
> Intel made a smart decision here by sticking with PIVT caches
Obviously this should be VIPT - Virtually indexed, physically tagged. SB/IB/Haswell L1 D$ is 32KB and 8-way. The set size is therefore the same as the minimum page size at 4KB, which means that virtual and physical indices are equivalent.
A15 L1 is also 32KB, but 4-way. Set size is therefore 8KB, which is larger than the minimum page size. The cache must therefore behave as though physically indexed to comply with ARMv7. As I noted in my previous post, this can be implemented either by doing TLB lookup before cache lookup, or by doing color prediction.
Does anybody happen to know what ARM did in A15? The load->use latency makes me think they're doing TLB and cache lookups sequencially. I seem to recall seeing a slide a while back that showed TLB lookup, then cache and tag lookup for all sets in parallel (not good for power...), then way-select based on tag compare.
Patrick Chase (patrickjchase.delete@this.gmail.com) on August 21, 2013 12:27 pm wrote:
> 3. Their L1 Dcaches are fast. Load->use latency is the same number of cycyles as A15,
> despite clocking up to at up to twice the rate and having higher associativity. IMO
> Intel made a smart decision here by sticking with PIVT caches
Obviously this should be VIPT - Virtually indexed, physically tagged. SB/IB/Haswell L1 D$ is 32KB and 8-way. The set size is therefore the same as the minimum page size at 4KB, which means that virtual and physical indices are equivalent.
A15 L1 is also 32KB, but 4-way. Set size is therefore 8KB, which is larger than the minimum page size. The cache must therefore behave as though physically indexed to comply with ARMv7. As I noted in my previous post, this can be implemented either by doing TLB lookup before cache lookup, or by doing color prediction.
Does anybody happen to know what ARM did in A15? The load->use latency makes me think they're doing TLB and cache lookups sequencially. I seem to recall seeing a slide a while back that showed TLB lookup, then cache and tag lookup for all sets in parallel (not good for power...), then way-select based on tag compare.