because per-core perf has stagnated

By: Adrian (, May 19, 2019 8:09 pm
Room: Moderated Discussions
chester lam ( on May 19, 2019 6:32 pm wrote:

> > For floating-point tasks, Haswell has doubled the throughput, then Skylake
> > Server has doubled it again. These are not minor improvements.
> Your code needs to use FMA to get Haswell's doubled FMA throughput. On Skylake server, your code needs
> to use 512-bit vectors to get that advantage. That doesn't apply to a lot of existing programs.

That is why I said that these improvements apply mainly to professional users, not for typical users. I am only using programs compiled specifically for the processors on which I run them, but few non-professional users do that.

> > Moreover, integer multi-precision computations have also been accelerated a lot since Broadwell.
> Hmm, I'll need to read up on that. But that sounds like another area that requires
> recompilation/rewriting code, so adoption will also be slow and limited.

Yes, like for AVX2 & FMA, you need to recompile your programs to use the "Large Integer Arithmetic" instructions introduced in Haswell (BMI2) & Broadwell (ADX).

Unfortunately like for AVX, AVX2 & FMA, Intel has chosen to not implement the BMI1, BMI2, ABM & ADX instruction groups on the Atom series of processors, including on the more recent Apollo Lake, Denverton and Gemini Lake, so many programs choose not to use these modern instructions, to support more existing computers.

> That's good. We got a good perf jump from 2011 to 2019. However, I don't think
> that's comparable to the jump from 2003 to 2011. The difference between a
> 3 GHz Prescott Pentium 4 and a 3.5 GHz Sandy Bridge is a lot bigger.

I completely agree for integer workloads. With a 3 GHz Pentium 4, I needed many 24 hour days of continuous compilation to compile from source all the programs that I needed to install on a Linux workstation, while with a Sandy Bridge that task required maybe a half of day at most.

Nevertheless the speed increase of Sandy Bridge, while still large, is much less impressive when compared with a 2.4 GHz Opteron available at the same time with the Pentium 4 (the 2.4 GHz Opteron being much faster for most tasks than a 3 GHz Pentium 4).

Pentium 4 had wild differences in speed depending on the program executed. For example, for floating-point matrix multiplication Pentium 4 was very fast if you were using SSE2 instructions (otherwise, with legacy 8087 instructions it had only half of the IPC of an Athlon), but for large integer multiplication or division a 3.2 GHz Pentium 4 was much slower than a 1.33 GHz Athlon.

The speed for single-thread integer computations increased at an irregular pace, being determined by increases in clock frequency, which were large from Pentium until Pentium 4 and small before and after that time interval, and by large jumps from time to time, when improved microarchitectures have been introduced, e.g. with Pentium Pro, Core 2 and Sandy Bridge.

However the speed of increase for the total floating-point throughput per socket was much more uniform during the last 25 years, because whenever neither higher clock frequencies nor improved micro-architectures were introduced, hardware was added to allow either more cores per socket or more operations per cycle per core, keeping a decent rate of improvement of the FP throughput.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Intel's roadmapLaurent2019/05/13 07:37 AM
  Intel's roadmapAlberto2019/05/13 08:44 AM
    Intel's roadmapblue2019/05/13 09:26 AM
    Intel's roadmapMaynard Handley2019/05/13 10:04 AM
      Intel's roadmapAdrian2019/05/13 12:15 PM
      Actually not bad for IntelChester Lam2019/05/14 04:26 PM
        Actually not bad for IntelMaynard Handley2019/05/14 05:33 PM
          Actually not bad for IntelChester Lam2019/05/14 07:52 PM
            Easily, just grab the LN2... (NT)blue2019/05/14 09:41 PM
            Actually not bad for IntelMaynard Handley2019/05/14 10:32 PM
              Application mattersChester Lam2019/05/15 02:15 AM
      Intel's roadmapAlberto2019/05/15 06:58 AM
        Intel's roadmapnone2019/05/15 07:25 AM
        Intel's roadmapChester Lam2019/05/15 07:32 AM
          Sh*** is Sh** foreverAlberto2019/05/15 07:47 AM
            Sh*** is Sh** forevernone2019/05/15 08:05 AM
              benchmarks...Chester Lam2019/05/15 08:33 AM
                benchmarks...none2019/05/15 09:09 AM
                  benchmarks...Chester Lam2019/05/15 03:51 PM
                    benchmarks...Doug S2019/05/16 12:10 PM
                      benchmarks...chester lam2019/05/16 02:20 PM
                        benchmarks...Doug S2019/05/16 02:28 PM
                          benchmarks...chester lam2019/05/16 03:00 PM
                            benchmarks...Doug S2019/05/17 02:39 AM
                              benchmarks...Chester Lam2019/05/17 03:54 AM
                                benchmarks...Doug S2019/05/17 10:52 AM
                                  because per-core perf has stagnatedchester lam2019/05/17 12:55 PM
                                    because per-core perf has stagnatedNathan2019/05/17 01:54 PM
                                      because per-core perf has stagnatedAdrian2019/05/17 09:39 PM
                                        because per-core perf has stagnatedchester lam2019/05/19 05:32 PM
                                          because per-core perf has stagnatedAdrian2019/05/19 08:09 PM
                                            because per-core perf has stagnatedFoo_2019/05/19 11:58 PM
                                              because per-core perf has stagnatedMichael S2019/05/20 12:48 AM
                                                because per-core perf has stagnatedAdrian2019/05/20 02:12 AM
                                                  because per-core perf has stagnatedMichael S2019/05/20 03:23 AM
                                                    because per-core perf has stagnatedMichael S2019/05/20 06:07 AM
                                                  because per-core perf has stagnatedAdrian2019/05/20 03:28 AM
                                                    because per-core perf has stagnatedMichael S2019/05/20 06:15 AM
                                                  because per-core perf has stagnatednone2019/05/20 03:41 AM
                      benchmarks...Maynard Handley2019/05/16 03:35 PM
                        benchmarks...dmcq2019/05/20 07:15 AM
                    benchmarks...Maxwell2019/05/16 09:47 PM
            Sh*** is Sh** foreverAnon2019/05/15 08:43 AM
    Intel's roadmapDoug S2019/05/13 12:24 PM
      Intel's roadmapwumpus2019/05/14 06:52 AM
      Intel's roadmapAlberto2019/05/15 07:10 AM
Reply to this Topic
Body: No Text
How do you spell avocado?