By: anon (anon.delete@this.anon.com), August 22, 2013 12:45 am
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on August 21, 2013 12:27 pm wrote:
> 3. Their L1 Dcaches are fast. Load->use latency is the same number of cycyles as A15, despite clocking
> up to at up to twice the rate and having higher associativity.
I don't want to derail your thread, but on a different tangent, you might call IVB L1 dcache fast, at 4 cycles load-to-use for simpler addressing modes, and 5 for longer. However POWER7 has a reported 2 cycle load to use with a cache of very similar size, assoc, throughput. POWER7 also clocks a lot higher, and is on an older process.
Before accusations that it's a nuclear power plant, it seems to have quite good energy / unit of work, and particularly considering the IO and multiprocessor fabric it is carrying, and its process disadvantage.
I wonder if you had any insights about why IBM is going opposite way of Intel with their L1 latency? (POWER5 was 4 cycle, and less associativity, so IBM's caches are getting faster while Intel's are getting slower, even when IBM's are way faster in absolute sense).
Presumably it is because they have found that tradeoff to have a positive change to power consumption per work done. I just wonder what it is about the rest of their pipeline that changes the L1 sweet-spot so significantly?
> 3. Their L1 Dcaches are fast. Load->use latency is the same number of cycyles as A15, despite clocking
> up to at up to twice the rate and having higher associativity.
I don't want to derail your thread, but on a different tangent, you might call IVB L1 dcache fast, at 4 cycles load-to-use for simpler addressing modes, and 5 for longer. However POWER7 has a reported 2 cycle load to use with a cache of very similar size, assoc, throughput. POWER7 also clocks a lot higher, and is on an older process.
Before accusations that it's a nuclear power plant, it seems to have quite good energy / unit of work, and particularly considering the IO and multiprocessor fabric it is carrying, and its process disadvantage.
I wonder if you had any insights about why IBM is going opposite way of Intel with their L1 latency? (POWER5 was 4 cycle, and less associativity, so IBM's caches are getting faster while Intel's are getting slower, even when IBM's are way faster in absolute sense).
Presumably it is because they have found that tradeoff to have a positive change to power consumption per work done. I just wonder what it is about the rest of their pipeline that changes the L1 sweet-spot so significantly?