By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), May 29, 2022 11:05 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on May 29, 2022 4:33 am wrote:
> The problem size should be large enough so that the running time of the benchmark
> would be large enough, e.g. 5 to 10 minutes, so that most of the time would
> be spent at the steady-state power consumption and clock frequencies. [..]
> by reading them from the CPU internal sensors, with programs
> like turbostat (an utility supplied by the Linux kernel) or many other such programs.
Thanks for the pointers, I was not aware of turbostat and will try that.
The benchmark that's currently top of mind is a Quicksort for which we can generate SSE4/AVX2/AVX-512 code, plus emulated 128-bit to test autovectorization, and a scalar-only HeapSort fallback. This will be interesting :)
> The problem size should be large enough so that the running time of the benchmark
> would be large enough, e.g. 5 to 10 minutes, so that most of the time would
> be spent at the steady-state power consumption and clock frequencies. [..]
> by reading them from the CPU internal sensors, with programs
> like turbostat (an utility supplied by the Linux kernel) or many other such programs.
Thanks for the pointers, I was not aware of turbostat and will try that.
The benchmark that's currently top of mind is a Quicksort for which we can generate SSE4/AVX2/AVX-512 code, plus emulated 128-bit to test autovectorization, and a scalar-only HeapSort fallback. This will be interesting :)