By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 2, 2013 11:18 am
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on July 2, 2013 10:35 am wrote:
>
> > and the arguably more common kind of real FPU use (which follows pointers and has
> > fairly sparse arrays rather than being some unrealistic pure linpack load) is
> > generally better off with the effort spent on integer and memory units.
>
> Examples? I know of many such loads (they arise all over the place in HPC and some
> areas of imaging) but none that I'd describe as "common" for desktop/mobile use.
I have admittedly not done a lot of games programming, and most of what I did was in the Quake/Doom timeframe (I had private access to sources for an alpha port). But from that little exposure, I can definitely say that the FP there wasn't traditional "large array accesses". The thing that came closest was the actual software rendering, which obviously nobody really does any more.
> > Integer vector units are often more useful, although the bulk of their use seems
> > to be for things like crypto and memory copies, which are really just specialized
> > engines that need some register space.
>
> Also imaging. 16- or 32-bit integer math is often sufficient
Agreed, but a lot of that seems to have been moved over to either specialized engines (ie all the mobile world seems to do things like jpeg engines), or in some cases done with the GPU for image effects (and for things like HDR you may actually want FP, but you really don't want to do it on the CPU anyway).
So I think that a fair chunk of the traditional imaging code has moved to the GPU. There are probably many filters that still exist and use vector units, but a lot of the fancy photoshop effects are about the GPU (both OpenGL and OpenCL).
> This depends on the level of vector parallelism in the workload. For something
> like AVX you need tens of operations that can be performed in parallel to
> efficiently utilize the CPU. For a GPU you need tens of thousands.
Agreed. But many of the users of the vector units tended to be things like pictures (or video), where you really do have millions of elements.
And again, I do argue that you want a certain baseline of FPU performance, so I wouldn't want an FPU that is completely broken. I just don't think people will necessarily notice (outside of benchmarks) if it has pipelined execution units and an OoO execution queue for the FPU itself (I do want OoO for the memory and integer pipelines, and the FP unit execution queue should obviously not hold up those)
Linus
>
> > and the arguably more common kind of real FPU use (which follows pointers and has
> > fairly sparse arrays rather than being some unrealistic pure linpack load) is
> > generally better off with the effort spent on integer and memory units.
>
> Examples? I know of many such loads (they arise all over the place in HPC and some
> areas of imaging) but none that I'd describe as "common" for desktop/mobile use.
I have admittedly not done a lot of games programming, and most of what I did was in the Quake/Doom timeframe (I had private access to sources for an alpha port). But from that little exposure, I can definitely say that the FP there wasn't traditional "large array accesses". The thing that came closest was the actual software rendering, which obviously nobody really does any more.
> > Integer vector units are often more useful, although the bulk of their use seems
> > to be for things like crypto and memory copies, which are really just specialized
> > engines that need some register space.
>
> Also imaging. 16- or 32-bit integer math is often sufficient
Agreed, but a lot of that seems to have been moved over to either specialized engines (ie all the mobile world seems to do things like jpeg engines), or in some cases done with the GPU for image effects (and for things like HDR you may actually want FP, but you really don't want to do it on the CPU anyway).
So I think that a fair chunk of the traditional imaging code has moved to the GPU. There are probably many filters that still exist and use vector units, but a lot of the fancy photoshop effects are about the GPU (both OpenGL and OpenCL).
> This depends on the level of vector parallelism in the workload. For something
> like AVX you need tens of operations that can be performed in parallel to
> efficiently utilize the CPU. For a GPU you need tens of thousands.
Agreed. But many of the users of the vector units tended to be things like pictures (or video), where you really do have millions of elements.
And again, I do argue that you want a certain baseline of FPU performance, so I wouldn't want an FPU that is completely broken. I just don't think people will necessarily notice (outside of benchmarks) if it has pipelined execution units and an OoO execution queue for the FPU itself (I do want OoO for the memory and integer pipelines, and the FP unit execution queue should obviously not hold up those)
Linus