By: zzyzx (zzyzx.delete@this.zzyzx.sh), May 21, 2022 7:44 pm
Room: Moderated Discussions
Jukka Larja (roskakori2006.delete@this.gmail.com) on May 20, 2022 9:48 pm wrote:
> Also, we may have different opinion of what counts as "good use". Battlefield 3 presentation is
> clearly about data-oriented programming. They obviously don't have separate non-SIMD case to compare
> with, but considering the CPUs in PS3 and Xbox 360, I'm pretty sure better data layout and generally
> avoiding branches plays a bigger part than any SIMD. Insomniac in their presentation reported 20-100
> times speedup. That's obviously mostly something else than 4-wide SIMD talking.
Certainly; I don't expect that SIMD itself feels essential anywhere in most games (versus a similarly well-crafted scalar path, not necessarily whatever fallback path might actually exist). My guess is only that it sees enough use to be important in CPU design for games, and relatively modest speedups in a handful of the hottest loops are enough for that.
It doesn't touch on performance details, but one story that comes to mind is how Assassin's Creed Odyssey required AVX on release. It initially sounded like they didn't plan to add a fallback path, but they did after it made the news.
> So yeah, maybe SIMD can be _meaningful_[1] in games, but being able to target twice as fast GPU is much,
> much more meaningful. On purely CPU side, choosing a better data layouts and algorithms, and in general
> targeting more specific cases (instead of always using the same culling algorithm, use a separate algorithms
> for a first-person game and a top-down game) is more meaningful. Practically speaking, it's also easier
> to target multiple cores[2] than SIMD. I think all of our programmers can do that (I presume so, but
> may be optimistic), whereas only couple would be somewhat at home with SSE.
>
> [1] We do have some SIMD code in our engine. One would have
> to be really desperate to call it "good use" though.
>
> [2] It's also often much more meaningful. If you have an unused core and move a task blocking main thread
> to run in background, you can get near infinite speedup for that task. As a rule, game engines don't run at
> 100 % CPU utilization. They are soft real-time, trying to get things done on time, then idle waiting for GPU.
> Shaving 1 ms off main thread at expense of 10 ms on two background cores each could be a good trade.
I get all this, but as a latency-sensitive gamer and the guy who ends up trying to figure things out when friends' games don't perform as expected, I'm regularly annoyed by the side-effects of many technical decisions like these (physics on the GPU and inefficient threading only good enough to be a win in the ideal case). Inefficient SIMD has some of inefficient threading's hazards but not others. As a programmer, of course I'd rather work with threading than SIMD, but as a gamer, I'd much rather deal with something over-vectorized than over-threaded.
Back to the top:
> Also: Because practically[1] no-one has come up with a good way for games to use (CPU) SIMD.
Maybe a big difference in our thinking is that I'm bearish about putting a lot of this stuff on the GPU, even looking a decade out. My experience with GPU physics in games so far is only as a thorn in my side (it doesn't even let the devs do things they otherwise couldn't because it falls back to CPU physics on AMD cards), and it's increasingly unclear to me that GPU-driven rendering (DOOM Eternal, Halo Infinite, Assassin's Creed since Unity) delivers results to match all that extra complexity. With gen9 consoles' strong CPUs and the PC CPU space heating up again, there doesn't seem to be such a shortage of CPU performance anymore to drive a shift of work over to the GPU either.
> Also, we may have different opinion of what counts as "good use". Battlefield 3 presentation is
> clearly about data-oriented programming. They obviously don't have separate non-SIMD case to compare
> with, but considering the CPUs in PS3 and Xbox 360, I'm pretty sure better data layout and generally
> avoiding branches plays a bigger part than any SIMD. Insomniac in their presentation reported 20-100
> times speedup. That's obviously mostly something else than 4-wide SIMD talking.
Certainly; I don't expect that SIMD itself feels essential anywhere in most games (versus a similarly well-crafted scalar path, not necessarily whatever fallback path might actually exist). My guess is only that it sees enough use to be important in CPU design for games, and relatively modest speedups in a handful of the hottest loops are enough for that.
It doesn't touch on performance details, but one story that comes to mind is how Assassin's Creed Odyssey required AVX on release. It initially sounded like they didn't plan to add a fallback path, but they did after it made the news.
> So yeah, maybe SIMD can be _meaningful_[1] in games, but being able to target twice as fast GPU is much,
> much more meaningful. On purely CPU side, choosing a better data layouts and algorithms, and in general
> targeting more specific cases (instead of always using the same culling algorithm, use a separate algorithms
> for a first-person game and a top-down game) is more meaningful. Practically speaking, it's also easier
> to target multiple cores[2] than SIMD. I think all of our programmers can do that (I presume so, but
> may be optimistic), whereas only couple would be somewhat at home with SSE.
>
> [1] We do have some SIMD code in our engine. One would have
> to be really desperate to call it "good use" though.
>
> [2] It's also often much more meaningful. If you have an unused core and move a task blocking main thread
> to run in background, you can get near infinite speedup for that task. As a rule, game engines don't run at
> 100 % CPU utilization. They are soft real-time, trying to get things done on time, then idle waiting for GPU.
> Shaving 1 ms off main thread at expense of 10 ms on two background cores each could be a good trade.
I get all this, but as a latency-sensitive gamer and the guy who ends up trying to figure things out when friends' games don't perform as expected, I'm regularly annoyed by the side-effects of many technical decisions like these (physics on the GPU and inefficient threading only good enough to be a win in the ideal case). Inefficient SIMD has some of inefficient threading's hazards but not others. As a programmer, of course I'd rather work with threading than SIMD, but as a gamer, I'd much rather deal with something over-vectorized than over-threaded.
Back to the top:
> Also: Because practically[1] no-one has come up with a good way for games to use (CPU) SIMD.
Maybe a big difference in our thinking is that I'm bearish about putting a lot of this stuff on the GPU, even looking a decade out. My experience with GPU physics in games so far is only as a thorn in my side (it doesn't even let the devs do things they otherwise couldn't because it falls back to CPU physics on AMD cards), and it's increasingly unclear to me that GPU-driven rendering (DOOM Eternal, Halo Infinite, Assassin's Creed since Unity) delivers results to match all that extra complexity. With gen9 consoles' strong CPUs and the PC CPU space heating up again, there doesn't seem to be such a shortage of CPU performance anymore to drive a shift of work over to the GPU either.