By: Jukka Larja (roskakori2006.delete@this.gmail.com), May 20, 2022 8:48 pm
Room: Moderated Discussions
zzyzx (zzyzx.delete@this.zzyzx.sh) on May 20, 2022 1:48 pm wrote:
> Jukka Larja (roskakori2006.delete@this.gmail.com) on May 20, 2022 10:04 am wrote:
> > Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 19, 2022 12:19 pm wrote:
> > > Jukka Larja (roskakori2006.delete@this.gmail.com) on May 19, 2022 11:14 am wrote:
> > > > Yeah, one man game studio. You can pick up pretty much any tech and make a game out of it.
> > >
> > > Fair point, it's not the most impressive example.
> > > Here's a talk, mostly techniques rather than results, but
> > > it does appear Insomniac Games used SIMD extensively.
> > >
> > > https://www.gdcvault.com/play/1022249/SIMD-at-Insomniac-Games-How
> >
> > Yeah, I've read those slides (they were maybe linked here previously, or I may even have read them sometime
> > in 2015). Would be really interesting to know how they have used SIMD in their games (I'm not sure if
> > the door example is real. I don't understand why they would need to test every door every time), though
> > from comment 'Solves "death by a thousand cuts" problems', I think they have just very unusual amount
> > of SIMD talent in-house. How else can you expect to SIMDify 1000 different places in code :D .
> >
> > -JLarja
>
> That they're even trying it on "death by a thousand cuts" problems itself implies that the hot
> loops with any appropriate parallelism got this treatment long ago. Collision detection and culling
> are the ones I'm most familiar with, but at least animation has probably got some as well.
>
> Here's one on culling: https://www.ea.com/frostbite/news/culling-the-battlefield-data-oriented-design-in-practice
>
> As a gameplay programmer, you may never see this stuff; even if you're working on an engine,
> all of the most critical parts may be hidden from you in middleware like PhysX and Umbra.
Unfortunately I've seen way too much of PhysX's code :D . But if you need high performance with PhysX, you run it (or whatever additions you choose) on GPU, not on CPU. That's actually a big problem with PhysX these days. Anything fancy requires CUDA or a DirectX or Vulkan GPU to run, which makes life hard for multi-platform titles (I'm not sure if Nvidia is more reasonable with the "big boys" though).
Data-oriented design is often a good idea (though personally I think people who don't actually participate in game design process tend to overestimate its usefulness), but whether to use SIMD with it or not is an after thought. That Battlefield 3 presentation is even older than the Insomniac one. PS3 and Xbox 360 were likely the golden times for CPU SIMD (in games), as the CPU were crap in general purpose terms and GPUs weren't all that good either. Since then most of the heavy lifting has been moved to GPU.
When I wrote in my first message that I can't name a single game that makes good use of SIMD, I meant exactly that: I can't name one. That doesn't mean I don't think there aren't any. It's just far enough from mainstream development that I don't hear about it (in stories like "the day I accidentally disabled SIMD and spend 17 hours wondering why everything is suddenly so slow", or the opposite).
Also, we may have different opinion of what counts as "good use". Battlefield 3 presentation is clearly about data-oriented programming. They obviously don't have separate non-SIMD case to compare with, but considering the CPUs in PS3 and Xbox 360, I'm pretty sure better data layout and generally avoiding branches plays a bigger part than any SIMD. Insomniac in their presentation reported 20-100 times speedup. That's obviously mostly something else than 4-wide SIMD talking.
So yeah, maybe SIMD can be _meaningful_[1] in games, but being able to target twice as fast GPU is much, much more meaningful. On purely CPU side, choosing a better data layouts and algorithms, and in general targeting more specific cases (instead of always using the same culling algorithm, use a separate algorithms for a first-person game and a top-down game) is more meaningful. Practically speaking, it's also easier to target multiple cores[2] than SIMD. I think all of our programmers can do that (I presume so, but may be optimistic), whereas only couple would be somewhat at home with SSE.
[1] We do have some SIMD code in our engine. One would have to be really desperate to call it "good use" though.
[2] It's also often much more meaningful. If you have an unused core and move a task blocking main thread to run in background, you can get near infinite speedup for that task. As a rule, game engines don't run at 100 % CPU utilization. They are soft real-time, trying to get things done on time, then idle waiting for GPU. Shaving 1 ms off main thread at expense of 10 ms on two background cores each could be a good trade.
-JLarja
> Jukka Larja (roskakori2006.delete@this.gmail.com) on May 20, 2022 10:04 am wrote:
> > Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 19, 2022 12:19 pm wrote:
> > > Jukka Larja (roskakori2006.delete@this.gmail.com) on May 19, 2022 11:14 am wrote:
> > > > Yeah, one man game studio. You can pick up pretty much any tech and make a game out of it.
> > >
> > > Fair point, it's not the most impressive example.
> > > Here's a talk, mostly techniques rather than results, but
> > > it does appear Insomniac Games used SIMD extensively.
> > >
> > > https://www.gdcvault.com/play/1022249/SIMD-at-Insomniac-Games-How
> >
> > Yeah, I've read those slides (they were maybe linked here previously, or I may even have read them sometime
> > in 2015). Would be really interesting to know how they have used SIMD in their games (I'm not sure if
> > the door example is real. I don't understand why they would need to test every door every time), though
> > from comment 'Solves "death by a thousand cuts" problems', I think they have just very unusual amount
> > of SIMD talent in-house. How else can you expect to SIMDify 1000 different places in code :D .
> >
> > -JLarja
>
> That they're even trying it on "death by a thousand cuts" problems itself implies that the hot
> loops with any appropriate parallelism got this treatment long ago. Collision detection and culling
> are the ones I'm most familiar with, but at least animation has probably got some as well.
>
> Here's one on culling: https://www.ea.com/frostbite/news/culling-the-battlefield-data-oriented-design-in-practice
>
> As a gameplay programmer, you may never see this stuff; even if you're working on an engine,
> all of the most critical parts may be hidden from you in middleware like PhysX and Umbra.
Unfortunately I've seen way too much of PhysX's code :D . But if you need high performance with PhysX, you run it (or whatever additions you choose) on GPU, not on CPU. That's actually a big problem with PhysX these days. Anything fancy requires CUDA or a DirectX or Vulkan GPU to run, which makes life hard for multi-platform titles (I'm not sure if Nvidia is more reasonable with the "big boys" though).
Data-oriented design is often a good idea (though personally I think people who don't actually participate in game design process tend to overestimate its usefulness), but whether to use SIMD with it or not is an after thought. That Battlefield 3 presentation is even older than the Insomniac one. PS3 and Xbox 360 were likely the golden times for CPU SIMD (in games), as the CPU were crap in general purpose terms and GPUs weren't all that good either. Since then most of the heavy lifting has been moved to GPU.
When I wrote in my first message that I can't name a single game that makes good use of SIMD, I meant exactly that: I can't name one. That doesn't mean I don't think there aren't any. It's just far enough from mainstream development that I don't hear about it (in stories like "the day I accidentally disabled SIMD and spend 17 hours wondering why everything is suddenly so slow", or the opposite).
Also, we may have different opinion of what counts as "good use". Battlefield 3 presentation is clearly about data-oriented programming. They obviously don't have separate non-SIMD case to compare with, but considering the CPUs in PS3 and Xbox 360, I'm pretty sure better data layout and generally avoiding branches plays a bigger part than any SIMD. Insomniac in their presentation reported 20-100 times speedup. That's obviously mostly something else than 4-wide SIMD talking.
So yeah, maybe SIMD can be _meaningful_[1] in games, but being able to target twice as fast GPU is much, much more meaningful. On purely CPU side, choosing a better data layouts and algorithms, and in general targeting more specific cases (instead of always using the same culling algorithm, use a separate algorithms for a first-person game and a top-down game) is more meaningful. Practically speaking, it's also easier to target multiple cores[2] than SIMD. I think all of our programmers can do that (I presume so, but may be optimistic), whereas only couple would be somewhat at home with SSE.
[1] We do have some SIMD code in our engine. One would have to be really desperate to call it "good use" though.
[2] It's also often much more meaningful. If you have an unused core and move a task blocking main thread to run in background, you can get near infinite speedup for that task. As a rule, game engines don't run at 100 % CPU utilization. They are soft real-time, trying to get things done on time, then idle waiting for GPU. Shaving 1 ms off main thread at expense of 10 ms on two background cores each could be a good trade.
-JLarja