By: Andrei F (andrei.delete@this.anandtech.com), November 18, 2020 8:44 am
Room: Moderated Discussions
Chester (lamchester.delete@this.gmail.com) on November 18, 2020 7:02 am wrote:
> Sure, Cinebench isn't the best representation of average workloads. But SPEC is far
> worse. No consumer cares about SPEC. The subtests are mostly based off applications
> no one uses, or very specific scientific simulations. It's even useless as a benchmark
> to see whether your system is working properly, because it's so overpriced.
>
> Even when they base a subtest off something a consumer might
> do, the results are hilariously off. For example:
> Encoding a 4K video using ffmpeg libx264 slow preset, on Haswell locked to 2.2 GHz
> - affinity set to 4 threads: 6.6 fps
> - no affinity set: 7.6 fps (1.15x scaling)
>
> 525.x264_r test: 1.0315x scaling
>
> A quick look at the benchmark description shows them using -bitrate 1000.
> If that's 1 kbps (or 1 mbps) bitrate, it's hilariously unrealistic.
>
> Now take AT Bench's POV-Ray scores for the i5-6600K (1741, 3.6 GHz all core turbo)
> and i7-6700K (2419, 4.2 GHz all core turbo). Scaling down the 6700K's score to account
> for clock speed difference gives 2073. SMT scaling would be roughly 1.19x
>
> 511.povray_r: 1.0291x scaling.
>
> What's going on?
>
> Also, some SPEC numbers make it seem like negative SMT scaling is common. It's not. I've
> personally never seen an application that can use all available threads do worse when
> SMT is enabled. Can we stop looking at the irrelevant pile of garbage that is SPEC?
>
> And what makes you claim "Cinebench has long dependency chains"? How do you know SMT scaling is from that rather than hiding cache misses better?
Because Maxon developers *literally told us that*. There's apparently a lot of random accesses to the BVH of the scene.
The CB23 score on the M1 goes from 5601 to to 7819 when comparing just the big cores to having both the big and small cores together. Explain to me how that happens that the SoC gets 39% more throughput from cores that are 1/4th the performance? It's an extreme showcase of a dependency workload that scales extremely well with threads - again, straight from the Maxon devs.
> Sure, Cinebench isn't the best representation of average workloads. But SPEC is far
> worse. No consumer cares about SPEC. The subtests are mostly based off applications
> no one uses, or very specific scientific simulations. It's even useless as a benchmark
> to see whether your system is working properly, because it's so overpriced.
>
> Even when they base a subtest off something a consumer might
> do, the results are hilariously off. For example:
> Encoding a 4K video using ffmpeg libx264 slow preset, on Haswell locked to 2.2 GHz
> - affinity set to 4 threads: 6.6 fps
> - no affinity set: 7.6 fps (1.15x scaling)
>
> 525.x264_r test: 1.0315x scaling
>
> A quick look at the benchmark description shows them using -bitrate 1000.
> If that's 1 kbps (or 1 mbps) bitrate, it's hilariously unrealistic.
>
> Now take AT Bench's POV-Ray scores for the i5-6600K (1741, 3.6 GHz all core turbo)
> and i7-6700K (2419, 4.2 GHz all core turbo). Scaling down the 6700K's score to account
> for clock speed difference gives 2073. SMT scaling would be roughly 1.19x
>
> 511.povray_r: 1.0291x scaling.
>
> What's going on?
>
> Also, some SPEC numbers make it seem like negative SMT scaling is common. It's not. I've
> personally never seen an application that can use all available threads do worse when
> SMT is enabled. Can we stop looking at the irrelevant pile of garbage that is SPEC?
>
> And what makes you claim "Cinebench has long dependency chains"? How do you know SMT scaling is from that rather than hiding cache misses better?
Because Maxon developers *literally told us that*. There's apparently a lot of random accesses to the BVH of the scene.
The CB23 score on the M1 goes from 5601 to to 7819 when comparing just the big cores to having both the big and small cores together. Explain to me how that happens that the SoC gets 39% more throughput from cores that are 1/4th the performance? It's an extreme showcase of a dependency workload that scales extremely well with threads - again, straight from the Maxon devs.