By: Adrian (a.delete@this.acm.org), May 29, 2022 1:39 pm
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 29, 2022 11:05 am wrote:
> Adrian (a.delete@this.acm.org) on May 29, 2022 4:33 am wrote:
> > The problem size should be large enough so that the running time of the benchmark
> > would be large enough, e.g. 5 to 10 minutes, so that most of the time would
> > be spent at the steady-state power consumption and clock frequencies. [..]
> > by reading them from the CPU internal sensors, with programs
> > like turbostat (an utility supplied by the Linux kernel) or many other such programs.
> Thanks for the pointers, I was not aware of turbostat and will try that.
>
> The benchmark that's currently top of mind is a Quicksort for which we can
> generate SSE4/AVX2/AVX-512 code, plus emulated 128-bit to test autovectorization,
> and a scalar-only HeapSort fallback. This will be interesting :)
You can find turbostat in any source tree of the Linux kernel at
/usr/src/linux/tools/power/x86/turbostat
From that directory you can do the usual "make" and "make install".
An example of invoking turbostat is
turbostat --interval 1 --show Core,CPU,Avg_MHz,Busy%,Bzy_MHz,TSC_MHz,CoreTmp,PkgWatt,CorWatt
This will display in the last 2 columns the total power consumption of the CPU package (cores + uncore) and the total power consumption of the CPU cores.
The "--interval 1" option instructs turbostat to display a new row every second.
If you have power samples in watt at 1 second intervals, summing them for the duration of the benchmark will give the energy in joule.
If you run turbostat in parallel with the test for a duration overlapping the test and you save its output in a file, you can detect the start and stop times of the test while scanning the turbostat output by the transitions from idle to busy of the tested cores.
> Adrian (a.delete@this.acm.org) on May 29, 2022 4:33 am wrote:
> > The problem size should be large enough so that the running time of the benchmark
> > would be large enough, e.g. 5 to 10 minutes, so that most of the time would
> > be spent at the steady-state power consumption and clock frequencies. [..]
> > by reading them from the CPU internal sensors, with programs
> > like turbostat (an utility supplied by the Linux kernel) or many other such programs.
> Thanks for the pointers, I was not aware of turbostat and will try that.
>
> The benchmark that's currently top of mind is a Quicksort for which we can
> generate SSE4/AVX2/AVX-512 code, plus emulated 128-bit to test autovectorization,
> and a scalar-only HeapSort fallback. This will be interesting :)
You can find turbostat in any source tree of the Linux kernel at
/usr/src/linux/tools/power/x86/turbostat
From that directory you can do the usual "make" and "make install".
An example of invoking turbostat is
turbostat --interval 1 --show Core,CPU,Avg_MHz,Busy%,Bzy_MHz,TSC_MHz,CoreTmp,PkgWatt,CorWatt
This will display in the last 2 columns the total power consumption of the CPU package (cores + uncore) and the total power consumption of the CPU cores.
The "--interval 1" option instructs turbostat to display a new row every second.
If you have power samples in watt at 1 second intervals, summing them for the duration of the benchmark will give the energy in joule.
If you run turbostat in parallel with the test for a duration overlapping the test and you save its output in a file, you can detect the start and stop times of the test while scanning the turbostat output by the transitions from idle to busy of the tested cores.