By: Some dude (nope.delete@this.nope.org), May 30, 2022 12:04 pm
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on May 29, 2022 4:33 am wrote:
> Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 29, 2022 12:51 am wrote:
> > Adrian (a.delete@this.acm.org) on May 24, 2022 2:39 pm wrote:
> > > I have not tried this on more recent Intel CPUs, but in a measurement on Skylake Server CPUs
> > > (with 2 512-bit FMA units) done a few years ago, the ratio between the energies needed to
> > > compute some LINPACK benchmark in AVX-512 and in AVX2 (i.e. with 256-bit FMA/LD/ST) modes
> > > was around 5/6, so a little more than your maximum estimation, but not much more.
> > Interesting, can you share some pointers on how this was measured so I can try it for
> > AVX-512 vs scalar? (I suspect that is a much larger difference than AVX2 vs AVX-512.)
>
>
> The starting point is having a BLAS library that includes optimized variants for the various ISA
> options, e.g. scalar, 128-bit SSE2, 128-bit AVX, 256-bit AVX, 256-bit AVX-512, 512-bit AVX-512.
>
> There are many such BLAS libraries, both open-source and proprietary.
>
> Then one can choose some linear algebra benchmark, e.g. LINPACK or one based on
> DGEMM, and compile and link it for all the ISA variants that must be tested.
>
> The problem size should be large enough so that the running time of the benchmark
> would be large enough, e.g. 5 to 10 minutes, so that most of the time would
> be spent at the steady-state power consumption and clock frequencies.
>
> Then you can write a script to run all the ISA-dependent executables, preferably running
> each of them for various number of active threads (and pinning the threads to cores),
> to obtain more data, e.g. the dependence of the clock frequency, of the power consumption
> and of the total energy on the number of active cores, not only on the ISA used.
>
>
> There are a lot of devices that can be inserted between the wall plug and the PSU cable, to measure the
> complete power and energy of the computer system, and display them on a LCD screen. You can read the
> total energy, the average power and the maximum power after each test run, and then reset the device
> for the next test. There are more expensive such devices that could be read directly by a computer.
>
> The power consumed by the CPU cores and their clock frequencies and temperature can be sampled periodically
> during the test by the test script, by reading them from the CPU internal sensors, with programs
> like turbostat (an utility supplied by the Linux kernel) or many other such programs.
>
> The power consumption samples can be integrated over the
> test time, giving the energy consumed in the CPU alone.
>
>
> The easiest is to run tests only to measure the energy consumed by the CPU, as that needs only
> free software. To measure the energy consumed by the entire computer system, you also need the
> hardware measurement device, but that is not expensive and it is useful for many purposes.
>
>
>
>
>
>
https://elmorlabs.com/product/elmorlabs-pmd-power-measurement-device/
2x EPS inputs to measure CPU voltage, current and power directly.
> Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 29, 2022 12:51 am wrote:
> > Adrian (a.delete@this.acm.org) on May 24, 2022 2:39 pm wrote:
> > > I have not tried this on more recent Intel CPUs, but in a measurement on Skylake Server CPUs
> > > (with 2 512-bit FMA units) done a few years ago, the ratio between the energies needed to
> > > compute some LINPACK benchmark in AVX-512 and in AVX2 (i.e. with 256-bit FMA/LD/ST) modes
> > > was around 5/6, so a little more than your maximum estimation, but not much more.
> > Interesting, can you share some pointers on how this was measured so I can try it for
> > AVX-512 vs scalar? (I suspect that is a much larger difference than AVX2 vs AVX-512.)
>
>
> The starting point is having a BLAS library that includes optimized variants for the various ISA
> options, e.g. scalar, 128-bit SSE2, 128-bit AVX, 256-bit AVX, 256-bit AVX-512, 512-bit AVX-512.
>
> There are many such BLAS libraries, both open-source and proprietary.
>
> Then one can choose some linear algebra benchmark, e.g. LINPACK or one based on
> DGEMM, and compile and link it for all the ISA variants that must be tested.
>
> The problem size should be large enough so that the running time of the benchmark
> would be large enough, e.g. 5 to 10 minutes, so that most of the time would
> be spent at the steady-state power consumption and clock frequencies.
>
> Then you can write a script to run all the ISA-dependent executables, preferably running
> each of them for various number of active threads (and pinning the threads to cores),
> to obtain more data, e.g. the dependence of the clock frequency, of the power consumption
> and of the total energy on the number of active cores, not only on the ISA used.
>
>
> There are a lot of devices that can be inserted between the wall plug and the PSU cable, to measure the
> complete power and energy of the computer system, and display them on a LCD screen. You can read the
> total energy, the average power and the maximum power after each test run, and then reset the device
> for the next test. There are more expensive such devices that could be read directly by a computer.
>
> The power consumed by the CPU cores and their clock frequencies and temperature can be sampled periodically
> during the test by the test script, by reading them from the CPU internal sensors, with programs
> like turbostat (an utility supplied by the Linux kernel) or many other such programs.
>
> The power consumption samples can be integrated over the
> test time, giving the energy consumed in the CPU alone.
>
>
> The easiest is to run tests only to measure the energy consumed by the CPU, as that needs only
> free software. To measure the energy consumed by the entire computer system, you also need the
> hardware measurement device, but that is not expensive and it is useful for many purposes.
>
>
>
>
>
>
https://elmorlabs.com/product/elmorlabs-pmd-power-measurement-device/
2x EPS inputs to measure CPU voltage, current and power directly.