N2, or V1?

By: Adrian (a.delete@this.acm.org), December 5, 2021 1:07 pm
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on December 5, 2021 10:45 am wrote:
> Adrian (a.delete@this.acm.org) on December 5, 2021 2:54 am wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on December 5, 2021 1:43 am wrote:
> > > Adrian (a.delete@this.acm.org) on December 5, 2021 12:15 am wrote:
> > > > Rayla (rayla.delete@this.example.com) on December 4, 2021 1:20 pm wrote:
> > > > >
> > > > > What slide shows that it only has one 256b SVE pipe? All they say is that they have
> > > > > 256b SVE - which is in line with what the V1 has. It's not different from Intel saying
> > > > > that ICL and SKL-SP have 512b AVX, despite having multiple 512b datapaths.
> > > >
> > > >
> > > > I agree that the slide is ambiguous, but that area in the diagram matched
> > > > the same area in the ARM diagram, except that the "2x" was removed.
> > > >
> > > > Also, this was not my own interpretation, but of Timothy Prickett Morgan from
> > > > NextPlatform, who asked Amazon about this, but did not receive any reply yet.
> > > >
> > > > If Graviton 3 would have indeed a complete V1, running 2 x 256-bit FMA at 2.6 GHz
> > > > with only a power consumption of 1.0 ... 1.25 W per core, that would be a huge improvement
> > > > in energy efficiency over the existing CPUs, which does not seem likely.
> > >
> > > It has four SIMD units so I think it is practically definitely a V1 2x256 bit
> > > SVE chip. There just wouldn't be any point in doing anything else. The cores
> > > would be run at 2.6GHz and with 5nm that would cut the power down greatly.
> > >
> > >
> >
> >
> > It would be nice if the 5-nm TSMC process would allow such a great reduction in power consumption,
> > because if Graviton 3 uses full V1 cores, that means that a V1 in 5 nm can match the performance
> > of an AMD Milan in 7 nm at less than half of the per core power consumption.
> >
> > When doing SVE/AVX 2 x 256 bit FMA, more than half of the power consumption is just in the
> > FPU (about 60% for Intel/AMD, while for ARM the proportion should be greater, since the rest
> > of the core is simpler), so the architecture of the CPU should matter much less for this limit
> > case than the transistor characteristics determined by the manufacturing process.
> Not necessarily.
> G3 is designed to run at 3GHz, and the transistors are appropriately specced.
> AMD and Intel both insist on their cores being able to execute single threaded at much higher GHz.
> (a) obviously the cost of any particular operation is a lot more at higher GHz.

You are right that when the target for the maximum frequency is lower, then indeed a fraction of the transistors can be made narrower and/or with higher threshold voltage and some transistors can be eliminated, because some extra buffer stages or pipeline stages are not needed any more.

This, like the manufacturing process, is not related to the architecture of the CPU. It does not matter if a core is ARM V1 or AMD Zen 3, the one made in 5 nm for 3 GHz will have better energy efficiency than the one made in 7 nm for 5 GHz.

The different target frequencies do indeed increase the advantage due to the better process, but it remains doubtful whether this combined advantage is enough to make Graviton 3 do the same work at less than half power per core.

> No surprise there. Is that 60% number at 5GHz or at 3GHz? And 60% of "core"
> power (ignoring L2/L3), of "SoC" power (ignoring DRAM) or total power.

I have said "power per core", so that did not include DRAM or the uncore.

The 60% was estimated on Intel CPUs from the Skylake derivative generations, at around 3 GHz, i.e. at the frequency where the CPU drops when doing all-core FMA computations.

It would be difficult to make such an estimation at 5 GHz, because to reach those frequencies you must run a program whose performance is limited by the turbo frequency limits, not by the power limits.

The estimations were done by running 64-bit, 128-bit and 256-bit FMA computations, to determine the power increments in the FPU per ALU width, while monitoring the power reported by the CPUs (as core power and uncore power) and the wall plug power consumption, to check that the reported power differences are accurate.

The results were also checked by comparison with the variations in AVX frequency, TDP and core count in the many Intel server SKUs, which were also consistent with a 60% power consumption in the FPU at 2 x 256-bit FMA and of 80% in the FPU at 2 x 512-bit FMA.

While those measurements have been done on Intel 14-nm CPUs and I have not repeated them yet on AMD CPUs, I do not expect that the power proportion can be much different.

Supposing that the implementation of an ARM core must be much more efficient than that of an Intel or AMD core, that implies that the part of the power required for the FPU when doing FMAs can be only significantly larger than for Intel, as the FPU must be equivalent, unlike other parts, like the instruction decoder, that can be much simpler.

Therefore I would expect that an ARM V1 core, when doing 2 x 256-bit FMAs, would use 65% to 75% of the total core power in the FPU. This FPU power can be smaller than the power needed by an Epyc only due to the newer manufacturing process, and, as you say, due to the lower maximum frequency target, which can permit the use of fewer/smaller/lower-leakage transistors.

> (b) something that I cannot justify with any simple explanation, but which seems to be a practical
> reality, is that GHz stretch designs land up burning a lot of power even when they're not running
> at that stretch GHz. I assume this is some combination of circuit techniques required to hit
> the frequencies, the particular tuning of the transistors, and the actual digital paths chosen
> (few levels of logic, even if the result is more paths tha burn current).
> There's also the fact that AMD and Intel have chosen to have the back-to-back latency of a fair number of
> their SIMD ops be one cycle. ARM (even Apple) have chosen to make that latency a minimum of two cycles. One
> can argue about why each made that decision and the circumstances for which it is optimal, but such a choice
> further exacerbates all the issues I raised about high frequencty, and further ties into ARM/Apple being
> able to provide wide SIMD at reasonable power as opposed to x86's choice of a fireball SIMD unit.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Some info about the Amazon Graviton 3Adrian2021/12/03 06:51 AM
  Some info about the Amazon Graviton 3Kara2021/12/03 07:01 AM
    Some info about the Amazon Graviton 3---2021/12/03 10:03 AM
      Some info about the Amazon Graviton 3Kara2021/12/03 10:45 AM
  Some info about the Amazon Graviton 3Kara2021/12/03 07:05 AM
    Some info about the Amazon Graviton 3none2021/12/03 07:19 AM
      Some info about the Amazon Graviton 3Kara2021/12/03 07:36 AM
        N2, or V1? Anon2021/12/03 07:52 AM
          N2, or V1? Adrian2021/12/03 09:47 AM
            N2, or V1? Adrian2021/12/03 09:52 AM
              N2, or V1? G2021/12/03 10:25 AM
                N2, or V1? Adrian2021/12/03 11:51 AM
                  N2, or V1? Wilco2021/12/03 02:58 PM
                    N2, or V1? Adrian2021/12/04 03:33 AM
                      N2, or V1? -.-2021/12/04 04:37 AM
                      N2, or V1? Rayla2021/12/04 02:20 PM
                        N2, or V1? Adrian2021/12/05 01:15 AM
                          N2, or V1? dmcq2021/12/05 02:43 AM
                            N2, or V1? Adrian2021/12/05 03:54 AM
                              N2, or V1? ---2021/12/05 11:45 AM
                                N2, or V1? Adrian2021/12/05 01:07 PM
                                  Other (minor) power factors?Paul A. Clayton2021/12/06 07:37 AM
                    N2, or V1? Anon2021/12/04 10:53 PM
                      N2, or V1? Andrei F2021/12/05 04:22 AM
                Only 4 ALUsJörn Engel2021/12/03 07:37 PM
                  Only 4 ALUsWilco2021/12/04 09:54 AM
          N2, or V1? -.-2022/05/24 06:34 AM
            Graviton3 on Chip &CheesePer Hesselgren2022/06/17 06:19 AM
Reply to this Topic
Body: No Text
How do you spell avocado?