By: Exophase (exophase.delete@this.gmail.com), February 4, 2015 7:31 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on February 4, 2015 6:19 am wrote:
> Memory disambiguation also does not seem like it would improve efficiency much. It increases the amount of speculation
> that can be done, which can increase performance of course, but improve perf/watt? I think IBM only implemented
> this with POWER8, and they haven't been ones to shy away from micro architectural complexity.
>
Memory disambiguation with a simple predictor rarely incorrectly speculates. The store buffer has to be scanned to see if loads hit stores in flight, but most cores have been doing this anyway to implement load to store forwarding for ops that were otherwise started in-order (even the old Cortex-A8 does this, at least for the scalar part)
The more execution width you have, the more important it becomes. The simple example is a loop with a body that loads things at the start and stores things at the end. Without memory disambiguation, separate iterations of that loop can't run in parallel. So maybe for A72 such a feature would go hand in hand with increased decode width, L/S units, ALUs, etc.
AMD only started doing it with Bulldozer, Apple only started doing it with Cyclone, and even Intel only started with Core 2. I don't think any of that is an indication of the feature not being an efficiency improvement.
> I would say perhaps improved branch prediction, reorganized cache design, and improved hardware prefetching.
>
I think they'll add a second load (and possibly store) unit, which Cyclone, Denver, and even Cortex-A17 have.
> I think the L2 cache might be brought in and be integrated with the core design as it is with other
> high performance CPUs.
> With a more modular and configurable L3 cache shared within the > cluster.
By integrated you mean a separate local smallish L2 cache for each core? Right now only Intel really does that with their non-Atom line, although other CPUs share larger L2 caches between two cores. Doesn't mean that ARM won't do this, but it'll mean increasing the minimum size of their clusters a lot if some L3 is required. And being able to do it without L3 could have some bad design repurcussions (that I think the Bulldozer line suffers from) Maybe with 128KB L2 caches it won't be too bad.
> The low associativity L1 and large shared modular L2 seems like a potential problem to me.
>
I agree, I always thought this could be a glass jaw for A15. A57 helps a little by increase associativity of icache to 3-way. 2-way associative L1 dcache in this day seems like a strange choice, even AMD moved away from that. It does give them cheap LRU replacement at least.
> Memory disambiguation also does not seem like it would improve efficiency much. It increases the amount of speculation
> that can be done, which can increase performance of course, but improve perf/watt? I think IBM only implemented
> this with POWER8, and they haven't been ones to shy away from micro architectural complexity.
>
Memory disambiguation with a simple predictor rarely incorrectly speculates. The store buffer has to be scanned to see if loads hit stores in flight, but most cores have been doing this anyway to implement load to store forwarding for ops that were otherwise started in-order (even the old Cortex-A8 does this, at least for the scalar part)
The more execution width you have, the more important it becomes. The simple example is a loop with a body that loads things at the start and stores things at the end. Without memory disambiguation, separate iterations of that loop can't run in parallel. So maybe for A72 such a feature would go hand in hand with increased decode width, L/S units, ALUs, etc.
AMD only started doing it with Bulldozer, Apple only started doing it with Cyclone, and even Intel only started with Core 2. I don't think any of that is an indication of the feature not being an efficiency improvement.
> I would say perhaps improved branch prediction, reorganized cache design, and improved hardware prefetching.
>
I think they'll add a second load (and possibly store) unit, which Cyclone, Denver, and even Cortex-A17 have.
> I think the L2 cache might be brought in and be integrated with the core design as it is with other
> high performance CPUs.
> With a more modular and configurable L3 cache shared within the > cluster.
By integrated you mean a separate local smallish L2 cache for each core? Right now only Intel really does that with their non-Atom line, although other CPUs share larger L2 caches between two cores. Doesn't mean that ARM won't do this, but it'll mean increasing the minimum size of their clusters a lot if some L3 is required. And being able to do it without L3 could have some bad design repurcussions (that I think the Bulldozer line suffers from) Maybe with 128KB L2 caches it won't be too bad.
> The low associativity L1 and large shared modular L2 seems like a potential problem to me.
>
I agree, I always thought this could be a glass jaw for A15. A57 helps a little by increase associativity of icache to 3-way. 2-way associative L1 dcache in this day seems like a strange choice, even AMD moved away from that. It does give them cheap LRU replacement at least.
Topic | Posted By | Date |
---|---|---|
ARM announces A72 | Maynard Handley | 2015/02/03 11:36 AM |
ARM announces A72 | anon | 2015/02/03 12:53 PM |
ARM announces A72 | Hugo Décharnes | 2015/02/03 01:20 PM |
ARM announces A72 | juanrga | 2015/02/03 04:15 PM |
ARM announces A72 | Wilco | 2015/02/04 12:58 AM |
ARM announces A72 | Eric Bron | 2015/02/04 01:48 AM |
ARM announces A72 | none | 2015/02/04 02:24 AM |
ARM announces A72 | Eric Bron | 2015/02/04 02:42 AM |
ARM announces A72 | Exophase | 2015/02/04 07:01 AM |
ARM announces A72 | Anon | 2015/02/04 07:35 AM |
ARM announces A72 | Exophase | 2015/02/04 07:58 AM |
ARM announces A72 | Groo | 2015/02/04 09:24 AM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 10:51 AM |
ARM Marketing, BS up to my ears | Maynard Handley | 2015/02/04 01:59 PM |
ARM Marketing, BS up to my ears | David Kanter | 2015/02/04 02:21 PM |
ARM Marketing, BS up to my ears | Groo | 2015/02/04 02:30 PM |
ARM announces A72 | juanrga | 2015/02/04 04:23 AM |
ARM announces A72 | Wilco | 2015/02/04 03:01 PM |
ARM announces A72 | juanrga | 2015/02/04 04:06 PM |
ARM announces A72 | Anon | 2015/02/04 01:28 AM |
ARM announces A72 | juanrga | 2015/02/04 04:31 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 06:49 AM |
ARM announces A72 | Ronald Maas | 2015/02/03 07:23 PM |
ARM announces A72 | Seni | 2015/02/04 12:19 AM |
ARM announces A72 | Maynard Handley | 2015/02/04 10:42 AM |
ARM announces A72 | Seni | 2015/02/04 12:33 PM |
ARM announces A72 | dmcq | 2015/02/04 12:57 PM |
ARM announces A72 | Ronald Maas | 2015/02/04 06:42 PM |
ARM announces A72 | anon | 2015/02/04 05:19 AM |
ARM announces A72 | Exophase | 2015/02/04 07:31 AM |
ARM announces A72 | David Kanter | 2015/02/04 10:25 AM |
ARM announces A72 | Exophase | 2015/02/04 01:33 PM |
ARM announces A72 | anon | 2015/02/04 10:27 PM |
ARM announces A72 (fixed format) | anon | 2015/02/04 10:29 PM |
ARM announces A72 | Exophase | 2015/02/04 11:11 PM |
ARM announces A72 | anon | 2015/02/05 12:02 AM |
ARM announces A72 | anon | 2015/02/04 05:57 PM |
ARM announces A72 | Wilco | 2015/02/03 01:39 PM |
ARM announces A72 | Maynard Handley | 2015/02/03 02:13 PM |
ARM announces A72 | anon | 2015/02/03 02:29 PM |
ARM announces A72 | Wilco | 2015/02/03 02:44 PM |
ARM announces A72 | David Kanter | 2015/02/04 09:56 AM |
ARM announces A72 | Peter Greenhalgh | 2015/02/04 10:56 AM |
ARM announces A72 | Aaron Spink | 2015/02/04 11:59 AM |
ARM announces A72 | Alberto | 2015/02/07 10:22 AM |
ARM announces A72 | Exophase | 2015/02/07 10:47 AM |
ARM announces A72 | Alberto | 2015/02/07 12:44 PM |
ARM announces A72 | Exophase | 2015/02/07 02:35 PM |
ARM announces A72 | Alberto | 2015/02/08 01:09 AM |
ARM announces A72 | Exophase | 2015/02/08 11:05 AM |
ARM announces A72 | David Kanter | 2015/02/08 12:39 AM |
ARM announces A72 | dmcq | 2015/02/08 04:14 AM |
ARM announces A72 | Michael S | 2015/02/08 04:38 AM |
ARM announces A72 | Gabriele Svelto | 2015/02/10 05:11 AM |
ARM announces A72 | Jouni Osmala | 2015/02/10 11:24 AM |
slit vs unified | Michael S | 2015/02/10 01:57 PM |
slit vs unified | dmcq | 2015/02/11 05:44 AM |
ARM announces A72 | Doug S | 2015/02/08 09:00 AM |
ARM announces A72 | Exophase | 2015/02/08 10:57 AM |
ARM announces A72 | dmcq | 2015/02/04 01:10 PM |
ARM announces A72 | David Kanter | 2015/02/04 02:28 PM |
ARM announces A72 | Wilco | 2015/02/04 01:59 PM |
ARM announces A72 | Aaron Spink | 2015/02/04 09:31 PM |
Intel 32nm vs 14 nm | Michael S | 2015/02/05 01:03 AM |
Intel 32nm vs 14 nm | Wilco | 2015/02/05 02:27 AM |
Intel 32nm vs 14 nm | David Kanter | 2015/02/05 09:05 AM |
Intel 32nm vs 14 nm | carop | 2015/02/05 11:12 AM |
Normalize to drawn or effective width? | David Kanter | 2015/02/05 11:45 AM |
Normalize to drawn or effective width? | carop | 2015/02/05 02:40 PM |
Normalize to drawn or effective width? | David Kanter | 2015/02/06 12:44 PM |