Realistically, Nvidia could use packed, single precision SSE for PhysX, if they wanted to take advantage of the CPU. Each instruction would execute up to 4 SIMD operations per cycle, rather than just one scalar operation. In theory, this could quadruple the performance of PhysX on a CPU, but the reality is that the gains are probably in the neighborhood of 2X on the current Nehalem and Westmere generation of CPUs. That is still a hefty boost and could easily move some games from the unplayable <24 FPS zone to >30 FPS territory when using CPU based PhysX. To put that into context, here’s a quote from Nvidia’s marketing:
[In Cryostasis], with fine grained simulation of water, icicle destruction, and particle effects, the CPU shows itself as woefully inadequate for delivering playable framerates. GPUs that lack PhysX support become bottlenecked as a result, delivering the same level of performance irrespective of the hardware’s graphics capability. GeForce GPUs with hardware physics support show a 2-4x performance gain, delivering great scalability across the GPU lineup.
That 2-4X performance gain sounds respectable on paper. In reality though, if the CPU could run 2X faster by using properly vectorized SSE code, the performance difference would drop substantially and in some cases disappear entirely. Unfortunately, it is hard to determine how much performance x87 costs. Without access to the source code for PhysX, we cannot do an apples-to-apples comparison that pits PhysX using x87 against PhysX using vectorized SSE. The closest comparison would be to compare the three leading physics packages (Havok from Intel, PhysX from Nvidia and the open source Bullet) on a given problem, running on the CPU. Havok is almost certain to be highly tuned for SSE vectors, given Intel’s internal resources and also their emphasis on using instruction set extensions like SSE and the upcoming AVX. Bullet is probably not quite as highly optimized as Havok, but it is available in source form, so a true x87 vs. vectorized SSE experiment is possible.
Not only would this physics solver comparison reveal the differences due to x87 vs. vectorized SSE, it would also show the impact of multi-threading. A review at the Tech Report already demonstrated that in some cases (e.g. Sacred II), PhysX will only use one of several available cores in a multi-core processor. Nvidia has clarified that CPU PhysX is by default single threaded and multi-threading is left to the developer. Nvidia has demonstrated that PhysX can be multi-threaded using CUDA on top of their GPUs. Clearly, with the proper coding and infrastructure, PhysX could take advantage of several cores in a modern CPU. For example, Westmere sports 6 cores, and using two cores for physics could easily yield a 2X performance gain. Combined with the benefits of vectorized SSE over x87, it is easy to see how a proper multi-core implementation using 2-3 cores could match the gains of PhysX on a GPU.
While as a buyer it may be frustrating to see PhysX hobbled on the CPU, it should not be surprising. Nvidia has no obligation to optimize for their competitor’s products. PhysX does not run on top of AMD GPUs, and nobody reasonably expects that it will. Not only because of the extra development and support costs, but also AMD would never want to give Nvidia early developer versions of their products. Nvidia wants PhysX to be an exclusive, and it will likely stay that way. In the case of PhysX on the CPU, there are no significant extra costs (and frankly supporting SSE is easier than x87 anyway). For Nvidia, decreasing the baseline CPU performance by using x87 instructions and a single thread makes GPUs look better. This tactic calls into question the CPU vs. GPU comparisons made using PhysX; but the name of the game at Nvidia is making the GPU look good, and PhysX certainly fits the bill in the current incarnation.
The bottom line is that Nvidia is free to hobble PhysX on the CPU by using single threaded x87 code if they wish. That choice, however, does not benefit developers or consumers though, and casts substantial doubts on the purported performance advantages of running PhysX on a GPU, rather than a CPU. There is already a large and contentious debate concerning the advantages of GPUs over CPUs and PhysX is another piece of that puzzle, but one that seems to create questions, rather than answers.