By: zzyzx (zzyzx.delete@this.zzyzx.sh), May 23, 2022 11:02 pm
Room: Moderated Discussions
Jukka Larja (roskakori2006.delete@this.gmail.com) on May 23, 2022 12:59 am wrote:
> PC games have rather large scale in CPU needs these days. Some people will want to game on their 160
> Hz (or even more) monitor, while recommended system should likely run at 60 and minimum around 30 FPS.
> If you start by presuming 30 FPS is OK and someone who just wants to run the game regardless of minimum
> specs will be happy with 20-25 FPS, dropping from four cores to two isn't all that crazy anymore.
>
> A typical target for modern game would be PS4, which is about 6 Jaguar cores at 1.6 Ghz. That's not
> many when translated to 4.2 GHz Comet Lake cores. A typical gamer may want to have a browser, some
> chat programs, maybe some VoIP running. Low-end gamer knows to close down everything but the game.
For sure, the total ability in those 2C isn't bad at all. If you could map all the work onto it perfectly, it'd be enough, the work just doesn't always map onto it anywhere close to perfectly.
Take Vermintide 2 as an example (Bitsquid/Stingray has very straightforward and well-documented yet reasonably typical threading and Vermintide 2 lets you adjust both the worker count and DX11 vs DX12). Most of the game's per-frame work happens on main, render main, a single worker pool, and situationally UMD threads. There's one very active UMD thread with an AMD GPU and DX11, and zero with an AMD GPU and DX12. With the default settings of 2 workers and DX11, the contention just makes a terrible mess of stutter and latency, even if average framerates are decent. 1 worker and DX11 simply isn't fast enough. 2 workers and DX12 is pretty good, since it gets the UMD thread out of the way.
If the total performance available is already marginal it doesn't take much of this trouble to make it unplayable, and I'd really hate to try to dial all that in with multiple worker pools and for any graphics driver.
Possibly the most "fun" thing I've run into along these lines is a priority inversion in Windows' input handling. csrss.exe's IO_DT thread has to run on time for each mouse poll or input gets lost. IO_DT itself is appropriately prio 16, but regularly waits on prio 8 threads of other programs interested in input (overlays / PTT for VoIP / etc, including graphics driver stuff you can't simply turn off). If a game does threaded work at boosted prio, it can block those prio 8 threads and therefore IO_DT, making a mess of its own input.
> PC games have rather large scale in CPU needs these days. Some people will want to game on their 160
> Hz (or even more) monitor, while recommended system should likely run at 60 and minimum around 30 FPS.
> If you start by presuming 30 FPS is OK and someone who just wants to run the game regardless of minimum
> specs will be happy with 20-25 FPS, dropping from four cores to two isn't all that crazy anymore.
>
> A typical target for modern game would be PS4, which is about 6 Jaguar cores at 1.6 Ghz. That's not
> many when translated to 4.2 GHz Comet Lake cores. A typical gamer may want to have a browser, some
> chat programs, maybe some VoIP running. Low-end gamer knows to close down everything but the game.
For sure, the total ability in those 2C isn't bad at all. If you could map all the work onto it perfectly, it'd be enough, the work just doesn't always map onto it anywhere close to perfectly.
Take Vermintide 2 as an example (Bitsquid/Stingray has very straightforward and well-documented yet reasonably typical threading and Vermintide 2 lets you adjust both the worker count and DX11 vs DX12). Most of the game's per-frame work happens on main, render main, a single worker pool, and situationally UMD threads. There's one very active UMD thread with an AMD GPU and DX11, and zero with an AMD GPU and DX12. With the default settings of 2 workers and DX11, the contention just makes a terrible mess of stutter and latency, even if average framerates are decent. 1 worker and DX11 simply isn't fast enough. 2 workers and DX12 is pretty good, since it gets the UMD thread out of the way.
If the total performance available is already marginal it doesn't take much of this trouble to make it unplayable, and I'd really hate to try to dial all that in with multiple worker pools and for any graphics driver.
Possibly the most "fun" thing I've run into along these lines is a priority inversion in Windows' input handling. csrss.exe's IO_DT thread has to run on time for each mouse poll or input gets lost. IO_DT itself is appropriately prio 16, but regularly waits on prio 8 threads of other programs interested in input (overlays / PTT for VoIP / etc, including graphics driver stuff you can't simply turn off). If a game does threaded work at boosted prio, it can block those prio 8 threads and therefore IO_DT, making a mess of its own input.