Modern cores

By: UnmaskedUnderflow (, July 23, 2020 7:50 am
Room: Moderated Discussions
anon (anon.delete@this.anontech.anon) on July 22, 2020 4:18 pm wrote:
> Maynard Handley ( on July 22, 2020 10:03 am wrote:
> > I cannot recommend enough the paper,
> >
> > (Evolution of the Samsung Exynos CPU Microarchitecture)
> >
> > It's like those classic papers written at the end of the RISC era, detailing the state of OoO
> > just before the relevant companies like MIPS/SGI and Alpha/DEC became defunct. The value is in
> > clarifying various aspects of how current (acceptably, but not best of class) CPUs perform things
> > like indirect branch prediction, cache management, or prefetching; showing in particular the degree
> > of complexity present beyond simple statements like "performs strided prefetch".
> >
> Considering that their core was significantly worse in each of power, performance, and area as
> compared to Arm's own designs, I do not think you can just assume that anybody else necessarily
> has does anything at all in the same fashion, including/especially indirect and conditional branch
> prediction, cache management, or prefetching. Those are all things with a lot of potential space
> to play around with. And judging from the overall area and die plots of A76, A77, as well as the
> relative area of the fetch structures and probable branch predictor memories vs caches, I think
> one might suppose that Samsung's and Arm's structures probably look very different.

Yes, this. Also thx Maynard, your analogy to the Alpha burn down papers is spot on, that's exactly what the point of this was...suggested by most of the industry folks for us to do so. Most surprising was how samsung let us do it, which was classy for the community.

The paper is on things that we cherry picked as good and didn't want lost to the void....e.g. nobody knew Dr. Jimenez was iterating perceptron predictors privately. It was not an analysis of what was wrong about the design that led to demise.

Most importantly, the outside world just sees block diagrams of pipes and stages and tables of OoO sizes. It's the details below that that make or break the cpu. Not the isa, the uarch.

W.r.t 4k pages vs 16k, the former needs 4x the lookups on random accesses. You should see it in dram latency charts
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Modern coresMaynard Handley2020/07/22 09:03 AM
  Modern coresEtienne2020/07/22 10:15 AM
    Modern coresMaynard Handley2020/07/22 01:19 PM
      Modern coresanon2020/07/22 03:13 PM
        Modern coresMaynard Handley2020/07/22 05:29 PM
          Modern coresChester2020/07/22 10:59 PM
            Modern coresMaynard Handley2020/07/23 09:06 AM
              Modern coresChester2020/07/23 10:33 AM
              Modern coresDoug S2020/07/23 02:14 PM
      You are ignoring the effect of page size to cache way size (NT)Heikki Kultala2020/07/23 06:16 AM
  Modern coresanon2020/07/22 03:18 PM
    Modern coresUnmaskedUnderflow2020/07/23 07:50 AM
  Modern coresJouni Osmala2020/07/22 10:17 PM
Reply to this Topic
Body: No Text
How do you spell avocado?