By: Chester (lamchester.delete@this.gmil.com), November 18, 2020 3:13 pm
Room: Moderated Discussions
Andrei F (andrei.delete@this.anandtech.com) on November 18, 2020 7:44 am wrote:
> Chester (lamchester.delete@this.gmail.com) on November 18, 2020 7:02 am wrote:
>
> > Sure, Cinebench isn't the best representation of average workloads. But SPEC is far
> > worse. No consumer cares about SPEC. The subtests are mostly based off applications
> > no one uses, or very specific scientific simulations. It's even useless as a benchmark
> > to see whether your system is working properly, because it's so overpriced.
> >
> > Even when they base a subtest off something a consumer might
> > do, the results are hilariously off. For example:
> > Encoding a 4K video using ffmpeg libx264 slow preset, on Haswell locked to 2.2 GHz
> > - affinity set to 4 threads: 6.6 fps
> > - no affinity set: 7.6 fps (1.15x scaling)
> >
> > 525.x264_r test: 1.0315x scaling
> >
> > A quick look at the benchmark description shows them using -bitrate 1000.
> > If that's 1 kbps (or 1 mbps) bitrate, it's hilariously unrealistic.
> >
> > Now take AT Bench's POV-Ray scores for the i5-6600K (1741, 3.6 GHz all core turbo)
> > and i7-6700K (2419, 4.2 GHz all core turbo). Scaling down the 6700K's score to account
> > for clock speed difference gives 2073. SMT scaling would be roughly 1.19x
> >
> > 511.povray_r: 1.0291x scaling.
> >
> > What's going on?
> >
> > Also, some SPEC numbers make it seem like negative SMT scaling is common. It's not. I've
> > personally never seen an application that can use all available threads do worse when
> > SMT is enabled. Can we stop looking at the irrelevant pile of garbage that is SPEC?
> >
> > And what makes you claim "Cinebench has long dependency chains"? How do you
> > know SMT scaling is from that rather than hiding cache misses better?
>
> Because Maxon developers *literally told us that*. There's apparently
> a lot of random accesses to the BVH of the scene.
So to be more specific, it's memory latency, not execution latency. That I can believe - Zen 2 almost never stalls rename bc the FP scheduler's full. However, the AGSQ does fill (~14.2% of cycles).
> The CB23 score on the M1 goes from 5601 to to 7819 when comparing just the big cores to having
> both the big and small cores together. Explain to me how that happens that the SoC gets 39% more
> throughput from cores that are 1/4th the performance? It's an extreme showcase of a dependency
> workload that scales extremely well with threads - again, straight from the Maxon devs.
Is it an 'extreme showcase of a dependency workload'? For starters it sounds like this applies to all raytracing workloads. And being limited by cache latency isn't exactly rare.
> Chester (lamchester.delete@this.gmail.com) on November 18, 2020 7:02 am wrote:
>
> > Sure, Cinebench isn't the best representation of average workloads. But SPEC is far
> > worse. No consumer cares about SPEC. The subtests are mostly based off applications
> > no one uses, or very specific scientific simulations. It's even useless as a benchmark
> > to see whether your system is working properly, because it's so overpriced.
> >
> > Even when they base a subtest off something a consumer might
> > do, the results are hilariously off. For example:
> > Encoding a 4K video using ffmpeg libx264 slow preset, on Haswell locked to 2.2 GHz
> > - affinity set to 4 threads: 6.6 fps
> > - no affinity set: 7.6 fps (1.15x scaling)
> >
> > 525.x264_r test: 1.0315x scaling
> >
> > A quick look at the benchmark description shows them using -bitrate 1000.
> > If that's 1 kbps (or 1 mbps) bitrate, it's hilariously unrealistic.
> >
> > Now take AT Bench's POV-Ray scores for the i5-6600K (1741, 3.6 GHz all core turbo)
> > and i7-6700K (2419, 4.2 GHz all core turbo). Scaling down the 6700K's score to account
> > for clock speed difference gives 2073. SMT scaling would be roughly 1.19x
> >
> > 511.povray_r: 1.0291x scaling.
> >
> > What's going on?
> >
> > Also, some SPEC numbers make it seem like negative SMT scaling is common. It's not. I've
> > personally never seen an application that can use all available threads do worse when
> > SMT is enabled. Can we stop looking at the irrelevant pile of garbage that is SPEC?
> >
> > And what makes you claim "Cinebench has long dependency chains"? How do you
> > know SMT scaling is from that rather than hiding cache misses better?
>
> Because Maxon developers *literally told us that*. There's apparently
> a lot of random accesses to the BVH of the scene.
So to be more specific, it's memory latency, not execution latency. That I can believe - Zen 2 almost never stalls rename bc the FP scheduler's full. However, the AGSQ does fill (~14.2% of cycles).
> The CB23 score on the M1 goes from 5601 to to 7819 when comparing just the big cores to having
> both the big and small cores together. Explain to me how that happens that the SoC gets 39% more
> throughput from cores that are 1/4th the performance? It's an extreme showcase of a dependency
> workload that scales extremely well with threads - again, straight from the Maxon devs.
Is it an 'extreme showcase of a dependency workload'? For starters it sounds like this applies to all raytracing workloads. And being limited by cache latency isn't exactly rare.