Article: PhysX87: Software Deficiency
By: anon (anon.delete@this.anon.com), July 19, 2010 11:45 pm
Room: Moderated Discussions
Anon (no@email.com) on 7/19/10 wrote:
---------------------------
>Linus Torvalds (torvalds@linux-foundation.org) on 7/19/10 wrote:
>---------------------------
>>Anon (no@email.com) on 7/19/10 wrote:
>>>
>>>Not mention media codecs, which I would suggest consume
>>>more of the world CPU cycles that many other more 'common'
>>>codes these days (if only because they dont just idle the
>>>cpu most of the time).
>>
>>Well, the well-behaved parts of media decoding are often
>>done largely by accelerators, because it's much more power-
>>efficient that way. And yes, those accelerators are often
>>fixed-function and largely in-order.
>>
>>But it does mean that there's much less nice core codec
>>kernels for the CPU, when things like scaling and YUV-RGB
>>conversion is done by the graphics pipeline. The higher
>>level decoding etc tends to be written in C, and is not
>>necessarily hand-tuned to the same degree.
>>
>>And because it runs on different micro-architectures, I
>>bet it all ends up preferring OoO execution cores. It's not
>>like things like the Atom made a good name for itself in
>>media decoding - quite the reverse.
>>
>>There's no question that you can't try to optimize media
>>codecs to hell and back on an in-order core. But it's a lot
>>of work, and the end result will definitely not run better
>>than on an OoO one.
>>
>>So I don't think media codecs are a good argument against
>>OoO. Except perhaps in the sense that you do want that
>>specialized graphics accelerator core (which tends to be
>>in-order and often have some fixed functions too).
>
>
>Sorry, I should have pointed out...
>I dont at all agree with the 'non-OoO is better for tight optimisation', I think thats rubbish.
>
>I was pointing at media kernels as places where very tight and careful optimisation
>is often done, and also where a good amount of real-world CPU cycles end up being spent.
>
>I suspect without OoO hiding a lot of latencies writing such kernels would be a
>lot harder (and require quite a few more magic pre-fetch and cache access instructions..).
The point is probably that with regular or quite well known memory access patterns, known cache structure, and minimal invalidations from outside sources, and at least a few instructions to do some prefetching. and a lot of time, then it _is_ possible to basically schedule instructions and fetches on an in-order core such that OOOE would not help much.
As soon as you start changing any of these parameters, the house of cards can get wobbly.
Profile driven optimizations and really good compilers can probably get a lot of the way there too, but they still fail when things change dynamically. Dynamic optimizing is really interesting and could get closer again, but I think at the low level when behaviour changes dynamically you can't beat OOOE.
Dynamic optimizing could probably be quite a big gain on OOOE CPUs as well though, by reducing icache footprint, function calls, branch mispredicts and so on.
>
>I've written enough DSP code to know that SSE, etal is a lot 'easier' for a target
>performance level, if not a lot easier to understand.
>
>I would say GPU code is easier that either, but ONLY because you can afford to
>make things VERY wide and leave quite a lot of cycles on the floor - and only if
>your application actually allows you to go that wide.... the good old horses for courses..
>
>I would say though that a LOT of people these days confuse low overhead (OS) and
>IO bound (database, a lot of webserver) code for compute code... its not EASY to
>saturate a modern CPU in a useful and efficient way.. it used to be a lot easier ;)
>
>Of course interpreted languages and deeply layered software stacks help with the
>job of soaking up CPU cycles nicely these days ;)
>
---------------------------
>Linus Torvalds (torvalds@linux-foundation.org) on 7/19/10 wrote:
>---------------------------
>>Anon (no@email.com) on 7/19/10 wrote:
>>>
>>>Not mention media codecs, which I would suggest consume
>>>more of the world CPU cycles that many other more 'common'
>>>codes these days (if only because they dont just idle the
>>>cpu most of the time).
>>
>>Well, the well-behaved parts of media decoding are often
>>done largely by accelerators, because it's much more power-
>>efficient that way. And yes, those accelerators are often
>>fixed-function and largely in-order.
>>
>>But it does mean that there's much less nice core codec
>>kernels for the CPU, when things like scaling and YUV-RGB
>>conversion is done by the graphics pipeline. The higher
>>level decoding etc tends to be written in C, and is not
>>necessarily hand-tuned to the same degree.
>>
>>And because it runs on different micro-architectures, I
>>bet it all ends up preferring OoO execution cores. It's not
>>like things like the Atom made a good name for itself in
>>media decoding - quite the reverse.
>>
>>There's no question that you can't try to optimize media
>>codecs to hell and back on an in-order core. But it's a lot
>>of work, and the end result will definitely not run better
>>than on an OoO one.
>>
>>So I don't think media codecs are a good argument against
>>OoO. Except perhaps in the sense that you do want that
>>specialized graphics accelerator core (which tends to be
>>in-order and often have some fixed functions too).
>
>
>Sorry, I should have pointed out...
>I dont at all agree with the 'non-OoO is better for tight optimisation', I think thats rubbish.
>
>I was pointing at media kernels as places where very tight and careful optimisation
>is often done, and also where a good amount of real-world CPU cycles end up being spent.
>
>I suspect without OoO hiding a lot of latencies writing such kernels would be a
>lot harder (and require quite a few more magic pre-fetch and cache access instructions..).
The point is probably that with regular or quite well known memory access patterns, known cache structure, and minimal invalidations from outside sources, and at least a few instructions to do some prefetching. and a lot of time, then it _is_ possible to basically schedule instructions and fetches on an in-order core such that OOOE would not help much.
As soon as you start changing any of these parameters, the house of cards can get wobbly.
Profile driven optimizations and really good compilers can probably get a lot of the way there too, but they still fail when things change dynamically. Dynamic optimizing is really interesting and could get closer again, but I think at the low level when behaviour changes dynamically you can't beat OOOE.
Dynamic optimizing could probably be quite a big gain on OOOE CPUs as well though, by reducing icache footprint, function calls, branch mispredicts and so on.
>
>I've written enough DSP code to know that SSE, etal is a lot 'easier' for a target
>performance level, if not a lot easier to understand.
>
>I would say GPU code is easier that either, but ONLY because you can afford to
>make things VERY wide and leave quite a lot of cycles on the floor - and only if
>your application actually allows you to go that wide.... the good old horses for courses..
>
>I would say though that a LOT of people these days confuse low overhead (OS) and
>IO bound (database, a lot of webserver) code for compute code... its not EASY to
>saturate a modern CPU in a useful and efficient way.. it used to be a lot easier ;)
>
>Of course interpreted languages and deeply layered software stacks help with the
>job of soaking up CPU cycles nicely these days ;)
>
Topic | Posted By | Date |
---|---|---|
A bit off base | John Mann | 2010/07/07 07:04 AM |
A bit off base | David Kanter | 2010/07/07 11:28 AM |
SSE vs x87 | Joel Hruska | 2010/07/07 12:53 PM |
SSE vs x87 | Michael S | 2010/07/07 01:07 PM |
SSE vs x87 | hobold | 2010/07/08 05:12 AM |
SSE vs x87 | David Kanter | 2010/07/07 02:55 PM |
SSE vs x87 | Andi Kleen | 2010/07/08 02:43 AM |
80 bit FP | Ricardo B | 2010/07/08 07:35 AM |
80 bit FP | David Kanter | 2010/07/08 11:14 AM |
80 bit FP | Kevin G | 2010/07/08 02:12 PM |
80 bit FP | Ian Ollmann | 2010/07/19 12:49 AM |
80 bit FP | David Kanter | 2010/07/19 11:33 AM |
80 bit FP | Anil Maliyekkel | 2010/07/19 04:49 PM |
80 bit FP | rwessel | 2010/07/19 05:41 PM |
80 bit FP | Matt Waldhauer | 2010/07/21 11:11 AM |
80 bit FP | Emil Briggs | 2010/07/22 09:06 AM |
A bit off base | John Mann | 2010/07/08 11:06 AM |
A bit off base | David Kanter | 2010/07/08 11:27 AM |
A bit off base | Ian Ameline | 2010/07/09 10:10 AM |
A bit off base | Michael S | 2010/07/10 02:13 PM |
A bit off base | Ian Ameline | 2010/07/11 07:51 AM |
A bit off base | David Kanter | 2010/07/07 09:46 PM |
A bit off base | Anon | 2010/07/08 12:47 AM |
A bit off base | anon | 2010/07/08 02:15 AM |
A bit off base | Gabriele Svelto | 2010/07/08 04:11 AM |
Physics engine history | Peter Clare | 2010/07/08 04:49 AM |
Physics engine history | Null Pointer Exception | 2010/07/08 06:07 AM |
Physics engine history | Ralf | 2010/07/08 03:09 PM |
Physics engine history | David Kanter | 2010/07/08 04:16 PM |
Physics engine history | sJ | 2010/07/08 11:36 PM |
Physics engine history | Gabriele Svelto | 2010/07/09 12:59 AM |
Physics engine history | sJ | 2010/07/13 06:35 AM |
Physics engine history | David Kanter | 2010/07/09 09:25 AM |
Physics engine history | sJ | 2010/07/13 06:49 AM |
Physics engine history | fvdbergh | 2010/07/13 07:27 AM |
A bit off base | John Mann | 2010/07/08 11:11 AM |
A bit off base | David Kanter | 2010/07/08 11:31 AM |
150 GFLOP/s measured? | anon | 2010/07/08 07:10 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/08 07:53 PM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/08 09:05 PM |
150 GFLOP/s measured? | anon | 2010/07/08 09:31 PM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/08 10:43 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/08 11:27 PM |
150 GFLOP/s measured? | Ian Ollmann | 2010/07/19 01:14 AM |
150 GFLOP/s measured? | anon | 2010/07/19 06:39 AM |
150 GFLOP/s measured? | hobold | 2010/07/19 07:26 AM |
Philosophy for achieving peak | David Kanter | 2010/07/19 11:49 AM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 07:36 AM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 08:42 AM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/19 08:56 AM |
150 GFLOP/s measured? | hobold | 2010/07/19 09:30 AM |
150 GFLOP/s measured? | Groo | 2010/07/19 02:31 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 04:17 PM |
150 GFLOP/s measured? | Groo | 2010/07/19 06:18 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 06:18 PM |
150 GFLOP/s measured? | Mark Roulo | 2010/07/19 11:47 AM |
150 GFLOP/s measured? | slacker | 2010/07/19 12:55 PM |
150 GFLOP/s measured? | Mark Roulo | 2010/07/19 01:00 PM |
150 GFLOP/s measured? | anonymous42 | 2010/07/25 12:31 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 12:41 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 02:57 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 04:10 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 04:10 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 04:25 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 04:31 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/20 06:04 AM |
150 GFLOP/s measured? | jrl | 2010/07/20 01:18 AM |
150 GFLOP/s measured? | anonymous42 | 2010/07/25 12:00 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/25 12:52 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 06:15 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 07:27 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 09:54 PM |
150 GFLOP/s measured? | anon | 2010/07/19 11:45 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 09:14 AM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 11:56 AM |
150 GFLOP/s measured? | a reader | 2010/07/21 08:16 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/21 09:05 PM |
150 GFLOP/s measured? | anon | 2010/07/22 02:09 AM |
150 GFLOP/s measured? | a reader | 2010/07/22 07:53 PM |
150 GFLOP/s measured? | gallier2 | 2010/07/23 05:58 AM |
150 GFLOP/s measured? | a reader | 2010/07/25 08:35 AM |
150 GFLOP/s measured? | David Kanter | 2010/07/25 11:49 AM |
150 GFLOP/s measured? | a reader | 2010/07/26 07:03 PM |
150 GFLOP/s measured? | Michael S | 2010/07/28 01:38 AM |
150 GFLOP/s measured? | Gabriele Svelto | 2010/07/28 01:44 AM |
150 GFLOP/s measured? | anon | 2010/07/23 04:55 PM |
150 GFLOP/s measured? | slacker | 2010/07/24 12:48 AM |
150 GFLOP/s measured? | anon | 2010/07/24 02:36 AM |
150 GFLOP/s measured? | Vincent Diepeveen | 2010/07/27 05:37 PM |
150 GFLOP/s measured? | ? | 2010/07/27 11:42 PM |
150 GFLOP/s measured? | slacker | 2010/07/28 05:55 AM |
Intel's clock rate projections | AM | 2010/07/28 02:03 AM |
nostalgia ain't what it used to be | someone | 2010/07/28 05:38 AM |
Intel's clock rate projections | AM | 2010/07/28 10:12 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/20 07:19 AM |
Separate the OoO-ness from speculative-ness | Mark Christiansen | 2010/07/20 02:26 PM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/20 06:04 PM |
Separate the OoO-ness from speculative-ness | Matt Sayler | 2010/07/20 06:10 PM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/20 09:37 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/20 11:51 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/21 02:16 AM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/21 07:05 AM |
Software conventions | Paul A. Clayton | 2010/07/21 08:52 AM |
Software conventions | ? | 2010/07/22 05:43 AM |
Speculation | David Kanter | 2010/07/21 10:32 AM |
Pipelining affects the ISA | ? | 2010/07/22 10:58 PM |
Pipelining affects the ISA | ? | 2010/07/22 11:14 PM |
Pipelining affects the ISA | rwessel | 2010/07/23 12:03 AM |
Pipelining affects the ISA | ? | 2010/07/23 05:50 AM |
Pipelining affects the ISA | ? | 2010/07/23 06:10 AM |
Pipelining affects the ISA | Thiago Kurovski | 2010/07/23 02:59 PM |
Pipelining affects the ISA | anon | 2010/07/24 07:35 AM |
Pipelining affects the ISA | Thiago Kurovski | 2010/07/24 11:12 AM |
Pipelining affects the ISA | Gabriele Svelto | 2010/07/26 02:50 AM |
Pipelining affects the ISA | IlleglWpns | 2010/07/26 05:14 AM |
Pipelining affects the ISA | Michael S | 2010/07/26 03:33 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/21 05:53 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/22 04:15 AM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/22 04:27 AM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/21 07:45 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/22 01:57 AM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/22 05:26 AM |
Separate the OoO-ness from speculative-ness | Dan Downs | 2010/07/22 08:14 AM |
Confusing and not very useful definition | David Kanter | 2010/07/22 12:41 PM |
Confusing and not very useful definition | ? | 2010/07/22 10:58 PM |
Confusing and not very useful definition | Ungo | 2010/07/24 12:06 PM |
Confusing and not very useful definition | ? | 2010/07/25 10:23 PM |
Separate the OoO-ness from speculative-ness | someone | 2010/07/20 08:02 PM |
Separate the OoO-ness from speculative-ness | Thiago Kurovski | 2010/07/21 04:13 PM |
You are just quoting SINGLE precision flops? OMG what planet do you live? | Vincent Diepeveen | 2010/07/19 10:26 AM |
The prior poster was talking about SP (NT) | David Kanter | 2010/07/19 11:34 AM |
All FFT's need double precision | Vincent Diepeveen | 2010/07/19 02:02 PM |
All FFT's need double precision | David Kanter | 2010/07/19 02:09 PM |
All FFT's need double precision | Vincent Diepeveen | 2010/07/19 04:06 PM |
All FFT's need double precision - not | Michael S | 2010/07/20 01:16 AM |
All FFT's need double precision - not | Ungo | 2010/07/21 12:04 AM |
All FFT's need double precision - not | Michael S | 2010/07/21 02:35 PM |
All FFT's need double precision - not | EduardoS | 2010/07/21 02:52 PM |
All FFT's need double precision - not | Anon | 2010/07/21 05:23 PM |
All FFT's need double precision - not | Ricardo B | 2010/07/26 07:46 AM |
I'm on a boat! | anon | 2010/07/22 11:42 AM |
All FFT's need double precision - not | Vincent Diepeveen | 2010/07/24 11:39 PM |
All FFT's need double precision - not | slacker | 2010/07/25 03:27 AM |
All FFT's need double precision - not | Ricardo B | 2010/07/26 07:40 AM |
All FFT's need double precision - not | EduardoS | 2010/07/25 08:37 AM |
All FFT's need double precision - not | Michael S | 2010/07/25 10:43 AM |
All FFT's need double precision - not | Vincent Diepeveen | 2010/07/24 11:19 PM |
A bit off base | EduardoS | 2010/07/08 04:08 PM |
A bit off base | Groo | 2010/07/08 06:11 PM |
A bit off base | john mann | 2010/07/08 06:58 PM |
All right...let's cool it... | David Kanter | 2010/07/08 07:54 PM |
A bit off base | Vincent Diepeveen | 2010/07/19 03:36 PM |