By: Brett (ggtgp.delete@this.yahoo.com), May 17, 2013 7:26 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on May 17, 2013 9:43 am wrote:
> RichardC (tich.delete@this.pobox.com) on May 14, 2013 10:38 am wrote:
> > Game benchmarks indicate that there's no significant
> > advantage for 4C/8T over 4C/4T with most current games.
>
> True, but completely irrelevant to your argument that gaming workloads can't
> benefit from TLP. There is a big difference between these two statements:
>
> 1. Current applications don't benefit from SMT
>
> 2. Client workloads don't benefit from SMT
>
> Benchmarks of current games only address (1), and basically demonstrate that the current crop
> of game implementations simply don't use all that many threads. That does not however tell us
> whether gaming *workloads* as a class instrinsically lack thread level parallelism (TLP) as you
> claim. There are two scenarios that are equally consistent with the benchmarks you cite:
>
> 1. The workloads intrinsically lack TLP
The primary bottleneck in all the console game engines I worked on (past tense) is building the single display list. Or you are just waiting on the GPU to finish drawing.
Everything else is separate threads, with only enough work to keep ~two more cores busy.
The other tasks finish early, you are always waiting on the display list task.
The Playstation 4 GPU is supposed to support ~8 display lists, if I read one of the articles correctly, from a link posted here in the past few days.
This will not help as much as you think, for bandwidth reasons you need a Z-buffer pass to run first, to reject hidden polys and thus save texture read bandwidth. Opaque polys are easy, the real problem is all the semi-transparent polys that you have to render from back to front, again you are stuck with single display list type limitations. Then you have all the full screen effects and lens flare, etc. Again these have to be done in a specific order to look right, blending against the previous surface colors. Once again you are stuck with single display list type limitations.
> 2. The people who coded the current crop of game engines didn't bother to expose TLP
> (probably because current cores are more than fast enough to keep up with gameplay at
> reasonable frame rates. There is consequently little incentive to multi-thread).
>
> I tend to believe (2) more than (1). As others have pointed out, there is known to be a decent amount
> of TLP in physics, AI, etc.
All those tasks are separate, the bottleneck is the GPU, or getting to the GPU.
Game designers have just plain run out of useful non-GPU seen work to do.
Someone on the internet has posted CPU load numbers for games, it's not unusual for CPU load to drop to 1 or below.
Even after two years from now when a useful number of PC's have PS4 class GPU's, I do not expect any games to use 8 cores, 4 is plenty, any extra would go to frill most users would not notice.
Games will not save Intel.
In a competitive market there would be a race to the bottom going on, with the market collapsing down to ARM level pricing. Like PC pricing minus the CPU. Not happening.
> RichardC (tich.delete@this.pobox.com) on May 14, 2013 10:38 am wrote:
> > Game benchmarks indicate that there's no significant
> > advantage for 4C/8T over 4C/4T with most current games.
>
> True, but completely irrelevant to your argument that gaming workloads can't
> benefit from TLP. There is a big difference between these two statements:
>
> 1. Current applications don't benefit from SMT
>
> 2. Client workloads don't benefit from SMT
>
> Benchmarks of current games only address (1), and basically demonstrate that the current crop
> of game implementations simply don't use all that many threads. That does not however tell us
> whether gaming *workloads* as a class instrinsically lack thread level parallelism (TLP) as you
> claim. There are two scenarios that are equally consistent with the benchmarks you cite:
>
> 1. The workloads intrinsically lack TLP
The primary bottleneck in all the console game engines I worked on (past tense) is building the single display list. Or you are just waiting on the GPU to finish drawing.
Everything else is separate threads, with only enough work to keep ~two more cores busy.
The other tasks finish early, you are always waiting on the display list task.
The Playstation 4 GPU is supposed to support ~8 display lists, if I read one of the articles correctly, from a link posted here in the past few days.
This will not help as much as you think, for bandwidth reasons you need a Z-buffer pass to run first, to reject hidden polys and thus save texture read bandwidth. Opaque polys are easy, the real problem is all the semi-transparent polys that you have to render from back to front, again you are stuck with single display list type limitations. Then you have all the full screen effects and lens flare, etc. Again these have to be done in a specific order to look right, blending against the previous surface colors. Once again you are stuck with single display list type limitations.
> 2. The people who coded the current crop of game engines didn't bother to expose TLP
> (probably because current cores are more than fast enough to keep up with gameplay at
> reasonable frame rates. There is consequently little incentive to multi-thread).
>
> I tend to believe (2) more than (1). As others have pointed out, there is known to be a decent amount
> of TLP in physics, AI, etc.
All those tasks are separate, the bottleneck is the GPU, or getting to the GPU.
Game designers have just plain run out of useful non-GPU seen work to do.
Someone on the internet has posted CPU load numbers for games, it's not unusual for CPU load to drop to 1 or below.
Even after two years from now when a useful number of PC's have PS4 class GPU's, I do not expect any games to use 8 cores, 4 is plenty, any extra would go to frill most users would not notice.
Games will not save Intel.
In a competitive market there would be a race to the bottom going on, with the market collapsing down to ARM level pricing. Like PC pricing minus the CPU. Not happening.