By: Jukka Larja (roskakori2006.delete@this.gmail.com), May 21, 2022 9:48 pm
Room: Moderated Discussions
zzyzx (zzyzx.delete@this.zzyzx.sh) on May 21, 2022 7:44 pm wrote:
> Jukka Larja (roskakori2006.delete@this.gmail.com) on May 20, 2022 9:48 pm wrote:
> > Also, we may have different opinion of what counts as "good use". Battlefield 3 presentation is
> > clearly about data-oriented programming. They obviously don't have separate non-SIMD case to compare
> > with, but considering the CPUs in PS3 and Xbox 360, I'm pretty sure better data layout and generally
> > avoiding branches plays a bigger part than any SIMD. Insomniac in their presentation reported 20-100
> > times speedup. That's obviously mostly something else than 4-wide SIMD talking.
>
> Certainly; I don't expect that SIMD itself feels essential anywhere in most games (versus
> a similarly well-crafted scalar path, not necessarily whatever fallback path might actually
> exist). My guess is only that it sees enough use to be important in CPU design for games,
> and relatively modest speedups in a handful of the hottest loops are enough for that.
>
> It doesn't touch on performance details, but one story that comes to mind is how
> Assassin's Creed Odyssey required AVX on release. It initially sounded like they
> didn't plan to add a fallback path, but they did after it made the news.
Some time ago there was a "scandal" about a game not running on some (mostly old AMD) CPUs. Turned out the game was using POPCNT, which according to Steam Hardware Survey was missing from about 1-2 % of Steam users' CPUs at that time. It's rather surprising that Assassin's Creed Odyssey even tried to require AVX. It would be interesting to know how they actually fixed it. Did they just drop AVX altogether, ship two binaries or make a runtime choice?
> > So yeah, maybe SIMD can be _meaningful_[1] in games, but being able to target twice as fast GPU is much,
> > much more meaningful. On purely CPU side, choosing a better data layouts and algorithms, and in general
> > targeting more specific cases (instead of always using
> > the same culling algorithm, use a separate algorithms
> > for a first-person game and a top-down game) is more meaningful. Practically speaking, it's also easier
> > to target multiple cores[2] than SIMD. I think all of our programmers can do that (I presume so, but
> > may be optimistic), whereas only couple would be somewhat at home with SSE.
> >
> > [1] We do have some SIMD code in our engine. One would have
> > to be really desperate to call it "good use" though.
> >
> > [2] It's also often much more meaningful. If you have an unused core and move a task blocking main thread
> > to run in background, you can get near infinite speedup for that task. As a rule, game engines don't run at
> > 100 % CPU utilization. They are soft real-time, trying to
> > get things done on time, then idle waiting for GPU.
> > Shaving 1 ms off main thread at expense of 10 ms on two background cores each could be a good trade.
>
> I get all this, but as a latency-sensitive gamer and the guy who ends up trying to figure things
> out when friends' games don't perform as expected, I'm regularly annoyed by the side-effects
> of many technical decisions like these (physics on the GPU and inefficient threading only good
> enough to be a win in the ideal case). Inefficient SIMD has some of inefficient threading's hazards
> but not others. As a programmer, of course I'd rather work with threading than SIMD, but as a
> gamer, I'd much rather deal with something over-vectorized than over-threaded.
I very much agree with you in that GPU physics and multi-threading may cause bad side effects, while it's hard to see CPU SIMD being a problem (at least SSE2, which doesn't require having a separate compatibility path on x64). However, SSE on provides 4x performance in the very best case, and getting anywhere near that is completely unrealistic in most cases.
> Back to the top:
>
> > Also: Because practically[1] no-one has come up with a good way for games to use (CPU) SIMD.
>
> Maybe a big difference in our thinking is that I'm bearish about putting a lot of this stuff on the
> GPU, even looking a decade out. My experience with GPU physics in games so far is only as a thorn
> in my side (it doesn't even let the devs do things they otherwise couldn't because it falls back to
> CPU physics on AMD cards), and it's increasingly unclear to me that GPU-driven rendering (DOOM Eternal,
> Halo Infinite, Assassin's Creed since Unity) delivers results to match all that extra complexity.
> With gen9 consoles' strong CPUs and the PC CPU space heating up again, there doesn't seem to be such
> a shortage of CPU performance anymore to drive a shift of work over to the GPU either.
I admit that the newest generation of consoles are different, but the last time there was a big jump in console CPU performance (PS3/Xbox360 to PS4/Xbox One), SIMD usage dropped. My understanding is that the reason was that GPU performance and programmability increased even more (also: CPU SIMD performance per core dropped, while core count and per core non-SIMD performance went up), which is again the case with newest generation (this time the CPU SIMD performance per core has also gone up though).
As for moving stuff to GPU in general, I'm not really an expert on that. I just trust that people who are the supposed expert know what they are doing. That said, I've noticed there's been cases of "you should do this on CPU, as there's likely plenty of free cycles there" in rendering world recently. The things offloaded back to CPU don't seem to be nice to SIMDify ones though, which is likely the reason they are such a good candidate to go back to CPU in the first place. They could (and likely will) benefit from SIMD, but as the whole point is that there is an overabundance of CPU cycles available, it's unlikely to be an important issue anytime soon.
-JLarja
> Jukka Larja (roskakori2006.delete@this.gmail.com) on May 20, 2022 9:48 pm wrote:
> > Also, we may have different opinion of what counts as "good use". Battlefield 3 presentation is
> > clearly about data-oriented programming. They obviously don't have separate non-SIMD case to compare
> > with, but considering the CPUs in PS3 and Xbox 360, I'm pretty sure better data layout and generally
> > avoiding branches plays a bigger part than any SIMD. Insomniac in their presentation reported 20-100
> > times speedup. That's obviously mostly something else than 4-wide SIMD talking.
>
> Certainly; I don't expect that SIMD itself feels essential anywhere in most games (versus
> a similarly well-crafted scalar path, not necessarily whatever fallback path might actually
> exist). My guess is only that it sees enough use to be important in CPU design for games,
> and relatively modest speedups in a handful of the hottest loops are enough for that.
>
> It doesn't touch on performance details, but one story that comes to mind is how
> Assassin's Creed Odyssey required AVX on release. It initially sounded like they
> didn't plan to add a fallback path, but they did after it made the news.
Some time ago there was a "scandal" about a game not running on some (mostly old AMD) CPUs. Turned out the game was using POPCNT, which according to Steam Hardware Survey was missing from about 1-2 % of Steam users' CPUs at that time. It's rather surprising that Assassin's Creed Odyssey even tried to require AVX. It would be interesting to know how they actually fixed it. Did they just drop AVX altogether, ship two binaries or make a runtime choice?
> > So yeah, maybe SIMD can be _meaningful_[1] in games, but being able to target twice as fast GPU is much,
> > much more meaningful. On purely CPU side, choosing a better data layouts and algorithms, and in general
> > targeting more specific cases (instead of always using
> > the same culling algorithm, use a separate algorithms
> > for a first-person game and a top-down game) is more meaningful. Practically speaking, it's also easier
> > to target multiple cores[2] than SIMD. I think all of our programmers can do that (I presume so, but
> > may be optimistic), whereas only couple would be somewhat at home with SSE.
> >
> > [1] We do have some SIMD code in our engine. One would have
> > to be really desperate to call it "good use" though.
> >
> > [2] It's also often much more meaningful. If you have an unused core and move a task blocking main thread
> > to run in background, you can get near infinite speedup for that task. As a rule, game engines don't run at
> > 100 % CPU utilization. They are soft real-time, trying to
> > get things done on time, then idle waiting for GPU.
> > Shaving 1 ms off main thread at expense of 10 ms on two background cores each could be a good trade.
>
> I get all this, but as a latency-sensitive gamer and the guy who ends up trying to figure things
> out when friends' games don't perform as expected, I'm regularly annoyed by the side-effects
> of many technical decisions like these (physics on the GPU and inefficient threading only good
> enough to be a win in the ideal case). Inefficient SIMD has some of inefficient threading's hazards
> but not others. As a programmer, of course I'd rather work with threading than SIMD, but as a
> gamer, I'd much rather deal with something over-vectorized than over-threaded.
I very much agree with you in that GPU physics and multi-threading may cause bad side effects, while it's hard to see CPU SIMD being a problem (at least SSE2, which doesn't require having a separate compatibility path on x64). However, SSE on provides 4x performance in the very best case, and getting anywhere near that is completely unrealistic in most cases.
> Back to the top:
>
> > Also: Because practically[1] no-one has come up with a good way for games to use (CPU) SIMD.
>
> Maybe a big difference in our thinking is that I'm bearish about putting a lot of this stuff on the
> GPU, even looking a decade out. My experience with GPU physics in games so far is only as a thorn
> in my side (it doesn't even let the devs do things they otherwise couldn't because it falls back to
> CPU physics on AMD cards), and it's increasingly unclear to me that GPU-driven rendering (DOOM Eternal,
> Halo Infinite, Assassin's Creed since Unity) delivers results to match all that extra complexity.
> With gen9 consoles' strong CPUs and the PC CPU space heating up again, there doesn't seem to be such
> a shortage of CPU performance anymore to drive a shift of work over to the GPU either.
I admit that the newest generation of consoles are different, but the last time there was a big jump in console CPU performance (PS3/Xbox360 to PS4/Xbox One), SIMD usage dropped. My understanding is that the reason was that GPU performance and programmability increased even more (also: CPU SIMD performance per core dropped, while core count and per core non-SIMD performance went up), which is again the case with newest generation (this time the CPU SIMD performance per core has also gone up though).
As for moving stuff to GPU in general, I'm not really an expert on that. I just trust that people who are the supposed expert know what they are doing. That said, I've noticed there's been cases of "you should do this on CPU, as there's likely plenty of free cycles there" in rendering world recently. The things offloaded back to CPU don't seem to be nice to SIMDify ones though, which is likely the reason they are such a good candidate to go back to CPU in the first place. They could (and likely will) benefit from SIMD, but as the whole point is that there is an overabundance of CPU cycles available, it's unlikely to be an important issue anytime soon.
-JLarja