Modern cores

By: Chester (lamchester.delete@this.gmail.com), July 23, 2020 10:33 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 23, 2020 10:06 am wrote:
> Chester (lamchester.delete@this.gmail.com) on July 22, 2020 11:59 pm wrote:
> > > > > Presumably pretty much everything Samsung describes, Apple does
> >
> > Maynard - what's the basis for this assumption?
>
> You're taking what I said way too literally.
> The point is that what Samsung describes must set essentially a floor to the quality of implementation
> of a modern CPU, given that Samsung's performance was essentially the lowest out there.

I disagree on both points.

A CPU with lower IPC can actually have more advanced architectural features, with perf improvements from those overshadowed by design flaws. Example - AMD's Bulldozer vs K10 Phenom. Bulldozer brought PRFs (ROB just holds ptrs to registers), wider decode, deeper queues everywhere, checkpointing for faster mispredict recovery, and much better branch prediction. Bulldozer could also speculatively move loads ahead of stores and do cmp/test+jcc fusion. K10 was primitive in comparison, but achieved better IPC.

From AT's review of the Exynos 990 (M5), it performs better than the Cortex-A76 at some points. And while it's behind the A77 on most tests, the margin isn't that huge.

> The reason I drew attention to this is that most discussion of CPUs is still stuck in the early 90s, talking
> about things like the size of the ROB, or the mere existence of prefetching, as though they're what determine
> performance. Look at what Samsung is discussing, sometimes as the main concern, sometimes as a throwaway.

ROB size matters. A lot of other early OOO concepts still matter (a lot) today. And I thought we were talking about prefetching implementation, not whether to prefetch. While prefetching has been done for many years, implementations have varied.

Also, it's possible for a CPU with no prefetching to outperform one that does (by being ahead in some other area, or by not wasting memory bandwidth on ineffective prefetches).

> So they take the existence of high quality directional branch prediction as given, their concerns
> are with performant indirect branch prediction. The take multi-strided prefetching as a given and
> augment it with more sophisticated mechanisms that try to prefetch pointer-based structures. They
> try (albeit not wonderfully) to ensure that the L1 and L2 prefetchers are working together rather
> than at cross purposes. They're using an exclusive L3, but in a manner that tries to track some
> degree of line history. They are worrying about dead lines and line placement. All the issues I've
> been talking about for years (and mostly had dismissed as academic nuttiness).
>
> I like the paper because, like the classic RISC papers, it states in public
> (as opposed to something one can merely assume as common sense [hah!]) a new
> baseline for what a high performance industrial CPU has to implement.

I suspect there are a lot more ways to create a high performance CPU than you realize. Keep in mind you can make tradeoffs in different areas, as opposed to being locked into "must have xyz"

> The point, in other words, is not that Apple implements things the same way as Samsung,
> but that if Samsung consider, eg, a prefetcher optimized for pointer-based structures to
> be not merely an academic curiosity but something worth implementing, then chances are
> that ARM and Apple likewise have a prefetcher that tracks pointer-based structures.

I think that's a leap too far, unless Apple documented their prefetch behavior or someone figured out a way to test it.

> (And Intel and AMD? WTF knows? Their self-imposed compatibility burden is so large, and
> their turnaround times so slow, that I've lost interest in most of what they are doing.)

Who doesn't have to worry about compatibility, unless someone's making a completely new ISA?

And TBH, I find Apple uninteresting, but mostly because so few details are public or verified through testing. So discussions about Apple architecture lead straight to "we don't know" or wild unfounded speculation.

AMD has been making steady progress. Even though Intel hasn't, I still find those worth discussing because we can verify info about their architectures.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Modern coresMaynard Handley2020/07/22 09:03 AM
  Modern coresEtienne2020/07/22 10:15 AM
    Modern coresMaynard Handley2020/07/22 01:19 PM
      Modern coresanon2020/07/22 03:13 PM
        Modern coresMaynard Handley2020/07/22 05:29 PM
          Modern coresChester2020/07/22 10:59 PM
            Modern coresMaynard Handley2020/07/23 09:06 AM
              Modern coresChester2020/07/23 10:33 AM
              Modern coresDoug S2020/07/23 02:14 PM
      You are ignoring the effect of page size to cache way size (NT)Heikki Kultala2020/07/23 06:16 AM
  Modern coresanon2020/07/22 03:18 PM
    Modern coresUnmaskedUnderflow2020/07/23 07:50 AM
  Modern coresJouni Osmala2020/07/22 10:17 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?