A bit off base

Article: PhysX87: Software Deficiency
By: David Kanter (dkanter.delete@this.realworldtech.com), July 8, 2010 11:27 am
Room: Moderated Discussions
John Mann (xman52373@aol.com) on 7/8/10 wrote:
>David Kanter (dkanter@realworldtech.com) on 7/7/10 wrote:
>>John Mann (xman52373@aol.com) on 7/7/10 wrote:
>>>While your article has merit, a few things you have over >looked. There is a water
>>>Simulation out using a 980X i7 CPU where all 12 core(6 real >and 6 virtual) are pegged
>>>at 100% and it still GETS trounced by a 9600GT running the >same simulation.
>>That's quite possible, and I don't see how that is relevant. My point was primarily
>>about x87 vs. SSE usage and how that impacts the performance comparison between CPUs and GPUs.
>>Multi-threading is a somewhat orthogonal issue, and as I pointed out - that is
>>left up to the developer for PhysX. Some developers are good and know what they
>>are doing. Some developers are not interested (or not capable) of writing high performance code.
>>I don't doubt that there are cases where the GPU is much faster than the CPU.
>>I think Nvidia's done a good job of presenting some of those cases to the public.
>>>A peformance
>>>differnce of around 10x if I recall correctly. And the >code run on the i7 was changed
>>>to run SSE code paths. So the boost could be seen, but the >GPU is and will always
>>>be magnitudes faster than CPU regardless of the code paths.
>>There are definitely workloads where the GPU is dramatically faster than a CPU.
>>If you simply look at raw execution power for single precision, GPUs are often faster in theory and on paper.
>>However, it really does depend on the algorithm you use, and how it's coded. Currently
>>vectorized SSE can provide up to a 4X boost over scalar SSE and probably a bit more
>>over x87. With AVX that will be an 8X boost.
>>Multithreading can probably hit up to 4-8X, depending on the code being executed.
>>There are a couple of papers I've seen that show at least a 25X variation in the
>>performance of various HPC workloads on a Nehalem due to tuning options.
>>So I agree that for some workloads, GPUs will be faster, but it totally depends
>>on the workload, the algorithms and the coding style. I mean, nobody is trying
>>to execute a C compiler on a GPU, and there's a good reason for that.
>While that is all well and good, your article does it best >to point out that PhysX
>can be done just as fast on the CPU as it is the GPU. >

No I didn't. I said that it it is possible to improve the performance of CPU PhysX, and the gains might be enough to make a difference.

>Problem here is Nvidia acknowledges
>that there are a few physics based situations where a GPU >can not be used and the
>physics work must be done by the CPU. But those that are >able to be run on the GPU,
>no matter the code optimazations made, will always be a >magnitude of 10 or more faster on the GPU vs the CPU.

That's simply not true. I've already made it clear that if we compare a high-end GPU to a high-end CPU, the GPU has at best a 8X advantage in FLOP/s and 5.5X in memory bandwidth. 5.5 < 8 < 10.

Moreover, it depends on how many cores in the CPU you can use, and how many cores in the GPU you can use. Some PhysX code may only use 1-2 cores (on Fermi).

>And softbodies is I believe but one of 2 or 3 that are >better suited for CPU use
>and not GPU. And interestingly enough, that is what you >used in your paper. So it
>is as if you are going out of your way to make Nvidia look >very bad when there is no need to.

I grabbed softbodies and cryostasis because they were convenient and easy, not out of any malice.

< Previous Post in ThreadNext Post in Thread >
