By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 1, 2013 1:40 pm
Room: Moderated Discussions
rwessel (robertwessel.delete@this.yahoo.com) on June 30, 2013 10:11 pm wrote:
>
> That being said, the CPU is intended to be general purpose, and I certainly don't begrudge hardware
> dedicated to other people's applications, as the economies of scale that have been demonstrated by the
> general purpose approach, and the resulting performance *and* price/performance, particularly with x86,
> are a *huge* win for almost every user.
So I'm a big fan of having a FPU unit, and making it standard (I dislike how ARM completely messed up that part). Emulation is slow enough that it can completely screw over people who need some floating point performance, and not having a guaranteed standard FP unit causes its own set of insanities (function calls with crazy calling conventions for simple operations, or just binaries that work on some microarchitectures but not others).
So I wouldn't argue for dropping FP support. Having a certain usable baseline is important.
But I do argue that very few people actually care very much about the actual performance of an FPU once you have something that is at least reasonable. And many people who think that they do care are likely wrong. Many FP loads aren't even that FP-ntensive in the end, and an in-order (and not particularly aggressive) FP unit is generally more than sufficient. You probably want it out-of-order wrt the other units, though.
Actual traditional array-based high-intensity FP is often fairly easy to schedule by the compiler (and doing cacheline blocking etc is more important than the FPU scheduling), and the arguably more common kind of real FPU use (which follows pointers and has fairly sparse arrays rather than being some unrealistic pure linpack load) is generally better off with the effort spent on integer and memory units.
So my argument is that spending the power and effort on a high-end FP unit for a mobile part (or even a server part - very little FP code there) is generally a waste of time.
It can make marketing sense, though. But it should be recognized as being about marketing and numbers games, not about actual real use. Very very few loads are truly about the FP unit.
Integer vector units are often more useful, although the bulk of their use seems to be for things like crypto and memory copies, which are really just specialized engines that need some register space. Most of the things that used to use vector units for actual vectors seem to be happier using the GPU (ie video decoding and encoding or things like photoshop effects may well use a vector unit, but if you can, you're generally even better off just using the GPU entirely and skip the vector unit).
Of course, if you do a high-end powerful chip (ie the big intel cores), then by all means go full out on the FPU. No reason to skimp, if you have the resources to do the best you can, by all means do it. Even there it shouldn't be the primary goal, but once you've done as much as you can on the integer side and the memory pipeline and have nothing better to do, then make your FPU unit the best you can.
Linus
>
> That being said, the CPU is intended to be general purpose, and I certainly don't begrudge hardware
> dedicated to other people's applications, as the economies of scale that have been demonstrated by the
> general purpose approach, and the resulting performance *and* price/performance, particularly with x86,
> are a *huge* win for almost every user.
So I'm a big fan of having a FPU unit, and making it standard (I dislike how ARM completely messed up that part). Emulation is slow enough that it can completely screw over people who need some floating point performance, and not having a guaranteed standard FP unit causes its own set of insanities (function calls with crazy calling conventions for simple operations, or just binaries that work on some microarchitectures but not others).
So I wouldn't argue for dropping FP support. Having a certain usable baseline is important.
But I do argue that very few people actually care very much about the actual performance of an FPU once you have something that is at least reasonable. And many people who think that they do care are likely wrong. Many FP loads aren't even that FP-ntensive in the end, and an in-order (and not particularly aggressive) FP unit is generally more than sufficient. You probably want it out-of-order wrt the other units, though.
Actual traditional array-based high-intensity FP is often fairly easy to schedule by the compiler (and doing cacheline blocking etc is more important than the FPU scheduling), and the arguably more common kind of real FPU use (which follows pointers and has fairly sparse arrays rather than being some unrealistic pure linpack load) is generally better off with the effort spent on integer and memory units.
So my argument is that spending the power and effort on a high-end FP unit for a mobile part (or even a server part - very little FP code there) is generally a waste of time.
It can make marketing sense, though. But it should be recognized as being about marketing and numbers games, not about actual real use. Very very few loads are truly about the FP unit.
Integer vector units are often more useful, although the bulk of their use seems to be for things like crypto and memory copies, which are really just specialized engines that need some register space. Most of the things that used to use vector units for actual vectors seem to be happier using the GPU (ie video decoding and encoding or things like photoshop effects may well use a vector unit, but if you can, you're generally even better off just using the GPU entirely and skip the vector unit).
Of course, if you do a high-end powerful chip (ie the big intel cores), then by all means go full out on the FPU. No reason to skimp, if you have the resources to do the best you can, by all means do it. Even there it shouldn't be the primary goal, but once you've done as much as you can on the integer side and the memory pipeline and have nothing better to do, then make your FPU unit the best you can.
Linus