ARM announces A72

By: anon (, February 4, 2015 5:57 pm
Room: Moderated Discussions
Exophase ( on February 4, 2015 8:31 am wrote:
> anon ( on February 4, 2015 6:19 am wrote:
> > Memory disambiguation also does not seem like it would improve
> > efficiency much. It increases the amount of speculation
> > that can be done, which can increase performance of course,
> > but improve perf/watt? I think IBM only implemented
> > this with POWER8, and they haven't been ones to shy away from micro architectural complexity.
> >
> Memory disambiguation with a simple predictor rarely incorrectly speculates. The store
> buffer has to be scanned to see if loads hit stores in flight, but most cores have been
> doing this anyway to implement load to store forwarding for ops that were otherwise started
> in-order (even the old Cortex-A8 does this, at least for the scalar part)

And a branch predictor is correct 99% of time time too :)

Not saying disambiguation is the wrong thing to do, I just think it would be strange to look for memory performance while the cache hierarchy is so bad. Fix that first, and then see what complexities you need to add to the core.

> The more execution width you have, the more important it becomes. The simple example is a
> loop with a body that loads things at the start and stores things at the end. Without memory
> disambiguation, separate iterations of that loop can't run in parallel. So maybe for A72 such
> a feature would go hand in hand with increased decode width, L/S units, ALUs, etc.
> AMD only started doing it with Bulldozer, Apple only started doing it with
> Cyclone, and even Intel only started with Core 2. I don't think any of that
> is an indication of the feature not being an efficiency improvement.

No, but neither is it an indication of the feature being an efficiency improvement. Or even efficiency neutral.

> > I would say perhaps improved branch prediction, reorganized cache design, and improved hardware prefetching.
> >
> I think they'll add a second load (and possibly store)
> unit, which Cyclone, Denver, and even Cortex-A17 have.
> > I think the L2 cache might be brought in and be integrated with the core design as it is with other
> > high performance CPUs.
> > > With a more modular and configurable L3 cache shared within the cluster.
> By integrated you mean a separate local smallish L2 cache for each core?

Yes, that's what I mean. Probably around 128-256K of local L2, and a retuned L1 (possibly smaller, with higher associativity -- 16K,4way might be reasonable if the L2 is fast).

> Right now only Intel really
> does that with their non-Atom line, although other CPUs share larger L2 caches between two cores.

You mean of smartphone class CPUs? In different spaces it is quite common -- POWER, Oracle SPARC, Itanium, Core/Xeon...

But it's not that one way is necessarily technically better than the other. I'm no microarchitect, and as you say there are certainly designs that go the other way. The real issue I see is that it has a very weak L1 cache subsystem, *and* the L2 is shared (making it slower), but also modular and configurable (making it slower and variable depending on configuration).

I guess it might also go the other way: improve their L1 significantly, and keeping the shared L2. Either way would be nice. I only guess for private L2 because they seem to like reduced size and ways of L1 to reduce power.

> Doesn't
> mean that ARM won't do this, but it'll mean increasing the minimum size of their clusters a lot if
> some L3 is required.

I guess it might be nice for massively parallel designs if no L3 was required :)

But for mobile space, everyone in this space is going with pretty large caches now. If L2 was 128K, then about 512K shared L3 per core would be a reasonable number.

> And being able to do it without L3 could have some bad design repurcussions (that
> I think the Bulldozer line suffers from) Maybe with 128KB L2 caches it won't be too bad.
> > The low associativity L1 and large shared modular L2 seems like a potential problem to me.
> >
> I agree, I always thought this could be a glass jaw for A15. A57 helps a little by increase
> associativity of icache to 3-way. 2-way associative L1 dcache in this day seems like a strange
> choice, even AMD moved away from that. It does give them cheap LRU replacement at least.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ARM announces A72Maynard Handley2015/02/03 11:36 AM
  ARM announces A72anon2015/02/03 12:53 PM
    ARM announces A72Hugo D√©charnes2015/02/03 01:20 PM
      ARM announces A72juanrga2015/02/03 04:15 PM
        ARM announces A72Wilco2015/02/04 12:58 AM
          ARM announces A72Eric Bron2015/02/04 01:48 AM
            ARM announces A72none2015/02/04 02:24 AM
              ARM announces A72Eric Bron2015/02/04 02:42 AM
                ARM announces A72Exophase2015/02/04 07:01 AM
                  ARM announces A72Anon2015/02/04 07:35 AM
                    ARM announces A72Exophase2015/02/04 07:58 AM
                      ARM announces A72Groo2015/02/04 09:24 AM
                ARM Marketing, BS up to my earsDavid Kanter2015/02/04 10:51 AM
                  ARM Marketing, BS up to my earsMaynard Handley2015/02/04 01:59 PM
                    ARM Marketing, BS up to my earsDavid Kanter2015/02/04 02:21 PM
                  ARM Marketing, BS up to my earsGroo2015/02/04 02:30 PM
          ARM announces A72juanrga2015/02/04 04:23 AM
            ARM announces A72Wilco2015/02/04 03:01 PM
              ARM announces A72juanrga2015/02/04 04:06 PM
        ARM announces A72Anon2015/02/04 01:28 AM
          ARM announces A72juanrga2015/02/04 04:31 AM
            ARM announces A72Aaron Spink2015/02/04 06:49 AM
      ARM announces A72Ronald Maas2015/02/03 07:23 PM
        ARM announces A72Seni2015/02/04 12:19 AM
          ARM announces A72Maynard Handley2015/02/04 10:42 AM
            ARM announces A72Seni2015/02/04 12:33 PM
              ARM announces A72dmcq2015/02/04 12:57 PM
            ARM announces A72Ronald Maas2015/02/04 06:42 PM
        ARM announces A72anon2015/02/04 05:19 AM
          ARM announces A72Exophase2015/02/04 07:31 AM
            ARM announces A72David Kanter2015/02/04 10:25 AM
              ARM announces A72Exophase2015/02/04 01:33 PM
                ARM announces A72anon2015/02/04 10:27 PM
                  ARM announces A72 (fixed format)anon2015/02/04 10:29 PM
                  ARM announces A72Exophase2015/02/04 11:11 PM
                    ARM announces A72anon2015/02/05 12:02 AM
            ARM announces A72anon2015/02/04 05:57 PM
  ARM announces A72Wilco2015/02/03 01:39 PM
    ARM announces A72Maynard Handley2015/02/03 02:13 PM
      ARM announces A72anon2015/02/03 02:29 PM
      ARM announces A72Wilco2015/02/03 02:44 PM
    ARM announces A72David Kanter2015/02/04 09:56 AM
      ARM announces A72Peter Greenhalgh2015/02/04 10:56 AM
        ARM announces A72Aaron Spink2015/02/04 11:59 AM
          ARM announces A72Alberto2015/02/07 10:22 AM
            ARM announces A72Exophase2015/02/07 10:47 AM
              ARM announces A72Alberto2015/02/07 12:44 PM
                ARM announces A72Exophase2015/02/07 02:35 PM
                  ARM announces A72Alberto2015/02/08 01:09 AM
                    ARM announces A72Exophase2015/02/08 11:05 AM
              ARM announces A72David Kanter2015/02/08 12:39 AM
                ARM announces A72dmcq2015/02/08 04:14 AM
                  ARM announces A72Michael S2015/02/08 04:38 AM
                    ARM announces A72Gabriele Svelto2015/02/10 05:11 AM
                      ARM announces A72Jouni Osmala2015/02/10 11:24 AM
                        slit vs unifiedMichael S2015/02/10 01:57 PM
                          slit vs unifieddmcq2015/02/11 05:44 AM
                  ARM announces A72Doug S2015/02/08 09:00 AM
                ARM announces A72Exophase2015/02/08 10:57 AM
        ARM announces A72dmcq2015/02/04 01:10 PM
        ARM announces A72David Kanter2015/02/04 02:28 PM
      ARM announces A72Wilco2015/02/04 01:59 PM
        ARM announces A72Aaron Spink2015/02/04 09:31 PM
        Intel 32nm vs 14 nmMichael S2015/02/05 01:03 AM
          Intel 32nm vs 14 nmWilco2015/02/05 02:27 AM
            Intel 32nm vs 14 nmDavid Kanter2015/02/05 09:05 AM
              Intel 32nm vs 14 nmcarop2015/02/05 11:12 AM
                Normalize to drawn or effective width?David Kanter2015/02/05 11:45 AM
                  Normalize to drawn or effective width?carop2015/02/05 02:40 PM
                    Normalize to drawn or effective width?David Kanter2015/02/06 12:44 PM
Reply to this Topic
Body: No Text
How do you spell tangerine? ūüćä