M2 benchmarks => MT

By: --- (---.delete@this.redheron.com), June 18, 2022 10:14 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on June 18, 2022 3:25 am wrote:
> Anon (Anon.delete@this.anon.com) on June 18, 2022 2:38 am wrote:
> > Adrian (a.delete@this.acm.org) on June 16, 2022 11:04 am wrote:
> > > Anon (Anon.delete@this.anon.com) on June 16, 2022 6:28 am wrote:
> > > > Adrian (a.delete@this.acm.org) on June 16, 2022 2:09 am wrote:
> > > > > Eric Fink (eric.delete@this.anon.com) on June 16, 2022 12:07 am wrote:
> > > > >
> > > > >
> > > > > > True enough for desktop, but M2 exceeds anything currently available in the laptop space.
> > > > > > It’s faster than top of the shelf Alder Lake P while being cheaper and using much less power.
> > > > > > In single core M2 is likely to be the fastest mobile CPU at the current time anyway.
> > > > >
> > > > >
> > > > > That is not true.
> > > > >
> > > > > Apple's own presentation of last week showed that the new M2 has only 87% of the single-thread
> > > > > performance of i7-1260P, which is not supposed to be the top of Alder Lake P (that
> > > > > would be i7-1280P, but it does not appear to be available anywhere).
> > > >
> > > > That chart was for multi thread not single thread.
> > > >
> > > >
> > >
> > > I have looked once more at the presentation, and I have found that I was wrong and you are right.
> > >
> > > Apple did not say what kind of performance was meant, but the end points in the performance
> > > comparison graph were at 14 W for M2 and 56 W for i7-1260P. These values are too large for a
> > > single-thread benchmark, so the results must have been indeed for a multi-thread benchmark.
> > >
> > >
> > > In this case, the results are actually worse for Apple than if the results had been for a single-thread
> > > benchmark, because in the power-limited condition the clock frequencies for Alder Lake are much lower than
> > > when they are limited by the maximum turbo values, and they are similar to the clock frequencies of M2.
> > > It is very likely that in a single-thread benchmark the advantage of Alder Lake P would be larger.
> > >
> > > According to the comparison graph, for most values of power consumption less
> > > than the nominal 28 W TDP, Alder Lake at equal performance with M2 consumes
> > > only a little more than the double of the power consumption of M2.
> > >
> > > At their nominal TDPs of 14 W for M2 and 28 W for i7-1260P, they both have a similar performance,
> > > but then doubling the power consumption of Alder Lake, to reach 4 times the power consumption
> > > of Apple M2, buys another about 15% of extra performance for Alder Lake P.
> > >
> > > It is true that i7-1260P has the advantage of 4 additional small cores. However, in
> > > the best case for the small cores, one can say that comparing i7-1260P (4+8) with M2
> > > (4+4) is somewhat equivalent to comparing 8 big Intel cores with 6 big Apple cores.
> > >
> > > From the Apple comparison graph it seems very likely that at equal clock frequencies 8 Intel cores have
> > > a similar performance with 6 Apple cores, which means about a 30% IPC advantage for Apple M2, which is
> > > consistent with the known IPC ratios between M1 and Tiger Lake and between Alder Lake and Tiger Lake.
> > >
> > >
> > I have no idea what you just said - seems to have no logic in it at all.
> > Eg. why would having more small cores in the intel processor not count in multithreaded
> > especially when we know intels little cores are significantly faster than Apple’s
> >
> >
> If you read what I have written above, you will see that I have accounted for both kinds of little
> cores as providing about half of the throughput of the big cores, so they have been counted.
> This is what Intel claims for theirs and what also appears to be true for the little Apple
> cores when running programs that do not use "Accelerate" library functions (i.e. SIMD
> instructions), where I have seen benchmarks with a speed ratio of 1.9. In programs that
> use SIMD, it seems that a little Apple core can match only 1/3 of a big core.
> Nonetheless, I have seen some benchmarks which suggested that the Apple Swift compiler does a very poor
> job in generating code appropriate for the little Apple cores, resulting in programs 1.5 to 2.5 times
> slower than when the programs were optimized for the little cores, with other tools. Whether Apple might
> have used a benchmark that runs slower than normal on the little cores, I cannot know, and in any case
> if that were true a lower result than possible would have been the fault of their own tools.

Apple's E-cores are designed to maximize performance/joule.
Intel's E-cores are designed to maximize performance/area.
OF COURSE they land up in very different places.

Apple's E-cores AT MAX PERFORMANCE are about 1/3 performance of a P core. This is the case across a range of tasks though of course it can swing from perhaps as high as 1/2 to as low as 1/4. For example A15 P specRATE1 is ~7.2, E specRATE1 is ~2.4.

BUT the E-core is PRIMARILY there as an OS companion core. It executes much OS code, utility code and such like; it's not there to boost user code. This is not a mild side issue; it's the entire point of the design. And so we have that
- on the battery designs (A class, M class) the E-core usually runs at around 1GHz, even when executing these utility tasks, but runs wide (that is many of the four E cores are running)
- on the desktop designs (M Pro/Max/Ultra) there are only two E cores. If only one is running, it runs at 1GHz (ie very basic OS background stuff), as soon as two run, they boost to 2GHz.
The effective result is that, most of the time, for the purposes they were designed, the 4 E cores give the same level of utility/background/OS processing as the 2 E cores; with the 2-model using slightly less area but slightly more energy.

I have no idea what your the claims about the Swift compiler, or for that matter E-core SIMD are about. But I suspect whatever tests you ran were confused by this frequency issue; unless you set things up correctly E-cores (especially on the 4 E-core models) will run at 1GHz, not 2GHz. And none of this is "wrong"; it's the cores doing what they were designed to do.

Meanwhile Intel's perf/area maximizing cores are solving a very different problem. An Alder Lake E core is about 70% of an Apple or Intel P core -- twice as "fast" as an Apple E core. BUT it uses ~10W, more than twice an Apple P core! As opposed to an Apple E-core that runs at .44W under SPEC, but is usually much much less.

There is no engineering mystery here, Intel's design point makes perfect sense for the creation of throughput-optimized large servers to compete with Ampere and Graviton; the only mystery is why the Marketing felt it necessary to pretend that these are energy-efficient cores, or why the Intel community seems to feel it necessary to go along with that charade.

So yes, once you're in the business of claiming either that Apple have 8 (rather than 4+4) cores, or that say a particular Intel design has 14 (rather than 6+8) cores, or comparing the throughput of Apple E cores vs Alder Lake E cores as though they are designed for the same purpose; well that's a pretty clear sign that you have no idea what you are talking about and deserve whatever contempt is thrown your way.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
M2 benchmarks-2022/06/15 12:27 PM
  You mean "absurd ARM"? ;-) (NT)Rayla2022/06/15 02:18 PM
    It has PPC heritage :) (NT)anon22022/06/15 02:55 PM
      Performance per clock2022/06/15 03:05 PM
        Performance per single clock cyclehobold2022/06/16 05:12 AM
          Performance per single clock cycledmcq2022/06/16 06:59 AM
            Performance per single clock cyclehobold2022/06/16 07:42 AM
          Performance per single clock cycleDoug S2022/06/16 09:39 AM
            Performance per single clock cyclehobold2022/06/16 12:36 PM
            More like cascaded ALUsPaul A. Clayton2022/06/16 01:13 PM
              SuperSPARC ALUMark Roulo2022/06/16 01:57 PM
                LEABrett2022/06/16 02:52 PM
  M2 benchmarksDaveC2022/06/15 03:31 PM
    M2 benchmarksanon22022/06/15 05:06 PM
    M2 benchmarks2022/06/15 07:21 PM
    M2 benchmarks---2022/06/15 07:33 PM
  M2 benchmarksAdrian2022/06/15 10:11 PM
    M2 benchmarksEric Fink2022/06/16 12:07 AM
      M2 benchmarksAdrian2022/06/16 02:09 AM
        M2 benchmarksEric Fink2022/06/16 05:46 AM
          M2 benchmarksAdrian2022/06/16 09:27 AM
            M2 benchmarks---2022/06/16 10:08 AM
              M2 benchmarksAdrian2022/06/16 11:43 AM
                M2 benchmarksDummond D. Slow2022/06/16 01:03 PM
                  M2 benchmarksAdrian2022/06/17 03:34 AM
                    M2 benchmarksDummond D. Slow2022/06/17 07:35 AM
            M2 benchmarksnone2022/06/16 10:14 AM
              M2 benchmarksAdrian2022/06/16 12:44 PM
            M2 benchmarksEric Fink2022/06/17 02:05 AM
        M2 benchmarksAnon2022/06/16 06:28 AM
          M2 benchmarks => MTAdrian2022/06/16 11:04 AM
            M2 benchmarks => MTAnon2022/06/18 02:38 AM
              M2 benchmarks => MTAdrian2022/06/18 03:25 AM
                M2 benchmarks => MT---2022/06/18 10:14 AM
      M2 benchmarksDoug S2022/06/16 09:49 AM
        M2 Pro at 3nmEric Fink2022/06/17 02:51 AM
    M2 benchmarksSean M2022/06/16 01:00 AM
      M2 benchmarksDoug S2022/06/16 09:56 AM
        M2 benchmarksjoema2022/06/16 01:28 PM
          M2 benchmarksSean M2022/06/16 02:53 PM
            M2 benchmarksDoug S2022/06/16 09:19 PM
              M2 benchmarksDoug S2022/06/16 09:21 PM
                M2 benchmarks---2022/06/16 10:53 PM
                  M2 benchmarksDoug S2022/06/17 12:37 AM
                  Apple’s STEM AmbitionsSean M2022/06/17 04:18 AM
                    Apple’s STEM Ambitions---2022/06/17 09:33 AM
                      Mac Pro with Nvidia H100Tony Wu2022/06/17 06:37 PM
                        Mac Pro with Nvidia H100Doug S2022/06/17 10:37 PM
                          Mac Pro with Nvidia H100Tony Wu2022/06/18 06:49 AM
                            Mac Pro with Nvidia H100Dan Fay2022/06/18 07:40 AM
                          Mac Pro with Nvidia H100Anon42022/06/20 09:04 AM
                            Mac Pro with Nvidia H100Simon Farnsworth2022/06/20 10:09 AM
                              Mac Pro with Nvidia H100Doug S2022/06/20 10:32 AM
                                Mac Pro with Nvidia H100Simon Farnsworth2022/06/20 11:20 AM
                              Mac Pro with Nvidia H100Anon42022/06/20 04:16 PM
                            Mac Pro with Nvidia H100Doug S2022/06/20 10:19 AM
                        Mac Pro with Nvidia H100me2022/06/18 07:17 AM
                          Mac Pro with Nvidia H100Tony Wu2022/06/18 09:28 AM
                            Mac Pro with Nvidia H100me2022/06/19 10:08 AM
                              Mac Pro with Nvidia H100Dummond D. Slow2022/06/19 10:51 AM
                                Mac Pro with Nvidia H100Elliott H2022/06/19 06:39 PM
                            Mac Pro with Nvidia H100Doug S2022/06/19 06:16 PM
                              Mac Pro with Nvidia H100---2022/06/19 06:56 PM
                                Mac Pro with Nvidia H100Sam G2022/06/19 11:00 PM
                                  Mac Pro with Nvidia H100---2022/06/20 06:25 AM
                                    Mac Pro with Nvidia H100anon52022/06/20 08:41 AM
                                      Mac Pro with Nvidia H100Sam G2022/06/20 07:22 PM
                                    Mac Pro with Nvidia H100Sam G2022/06/20 07:13 PM
                                      Mac Pro with Nvidia H100Doug S2022/06/20 10:19 PM
                                        Mac Pro with Nvidia H100Sam G2022/06/22 12:06 AM
                                          Mac Pro with Nvidia H100Doug S2022/06/22 09:18 AM
                                  Mac Pro with Nvidia H100Doug S2022/06/20 10:38 AM
                                    Mac Pro with Nvidia H100Sam G2022/06/20 07:17 PM
                              Mac Pro with Nvidia H100Dummond D. Slow2022/06/20 05:46 PM
                      Apple’s STEM Ambitionsnoko2022/06/17 07:32 PM
                      Quick aside: huge pages also useful for nested page tables (virtualization) (NT)Paul A. Clayton2022/06/18 06:28 AM
                        Quick aside: huge pages also useful for nested page tables (virtualization)---2022/06/18 10:16 AM
          Not this nonsense againAnon2022/06/16 03:06 PM
            Parallel video encodingWes Felter2022/06/16 04:57 PM
              Parallel video encodingDummond D. Slow2022/06/16 07:16 PM
                Parallel video encodingWes Felter2022/06/16 07:49 PM
              Parallel video encoding---2022/06/16 07:41 PM
                Parallel video encodingDummond D. Slow2022/06/16 10:08 PM
                  Parallel video encoding---2022/06/16 11:03 PM
                    Parallel video encodingDummond D. Slow2022/06/17 07:45 AM
            Not this nonsense againjoema2022/06/16 09:13 PM
              Not this nonsense again---2022/06/16 11:18 PM
  M2 benchmarks-DDR4 vs DDR5Per Hesselgren2022/06/16 01:09 AM
    M2 benchmarks-DDR4 vs DDR5Rayla2022/06/16 08:12 AM
      M2 benchmarks-DDR4 vs DDR5Doug S2022/06/16 09:58 AM
        M2 benchmarks-DDR4 vs DDR5Rayla2022/06/16 11:58 AM
Reply to this Topic
Body: No Text
How do you spell avocado?