Article: PhysX87: Software Deficiency
By: rwessel (robertwessel.delete@this.yahoo.com), July 23, 2010 12:03 am
Room: Moderated Discussions
? (0xe2.0x9a.0x9b@gmail.com) on 7/22/10 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 7/21/10 wrote:
>---------------------------
>>>Well, but you cannot go back to 1970 and fix 8086 so that >it knows about the concept of imprecise exceptions.
>>
>>What pipeline management instructions would you want?
>
>Maybe this isn't going to exactly answer the question you are asking, but it is fairly close in general:
>
>Well, for example it seems clear to me that the ISA of a pipelined CPU should have
>two flags registers. One for doing computations which are feeding results into branch
>instructions and the other one for doing computations unrelated to branching. Take this code for example:
>
>
>
>The corresponding x86 code (one flags-register) is:
>
>
>
>Notice that you cannot reorder the ADD and CMP because it works with the-same-and-only-one
>FLAGS register. (If you force the reordering and save the FLAGS into memory, it will do more harm than good.)
>
>If you have two flag registers (F1,F2):
>
>
>
>The distinction is that in the latter case CMP happens one cycle sooner than in
>the former case. This may not sound like much, but in a pipelined CPU it means that
>the jump's target will be resolved (as in: known for a fact) one cycle sooner.
>
>Intel's Core2 with CMP+JMP fusion? That seems like a bad joke. Well, it speeds
>up x86 code that was compiled for the model with only one FLAGS register, no doubts
>about that. But as can be seen in the latter asm code, CMP+JMP fusion does not make
>much sense once you have multiple FLAGS registers.
>
>What about AMD's Bulldozer? It seems they are going to just blindly copy the CMP+JMP
>fusion concept (to speed up existing codes). But are they going to e.g. introduce
>a new instruction prefix for selecting the FLAGS register, bring some innovation into this field and beat Intel? Nooo.
>
>When Intel was designing the 386, they most likely already knew it is going to
>be a pipelined CPU. They could have put two FLAGS registers in there. Call me a
>skeptic, but I think the reason why the ISA has only one such register was that
>the ISA designers had no idea what they were doing (at least in this particular case).
>
>One of the main consequences is that having two flags registers lowers the pressure
>put on the branch prediction engine. There would even exist cases in which you can
>keep the pipeline fully utilized in the presence of conditional branch instructions
>in the code - without resorting to any speculations about the most likely branch targets ...
>
Well, PPC already does that, and it does help code in some cases, but it doesn't seem like a major win, and no one is really rushing to copy that. Of course the RISC approach of not having a flags register at all, is another solution.
Of more general use than a duplicate flags register would be more instructions that don't set the flags. And within the existing scope of x86, you can do that yourself (by replacing the add with a mov/lea sequence), or minimize the effect by unrolling. Of course those techniques might not apply to any particular case, but they do to this one, which goes to show that short code snippets are not worth too much when it comes to making architectural decisions.
---------------------------
>David Kanter (dkanter@realworldtech.com) on 7/21/10 wrote:
>---------------------------
>>>Well, but you cannot go back to 1970 and fix 8086 so that >it knows about the concept of imprecise exceptions.
>>
>>What pipeline management instructions would you want?
>
>Maybe this isn't going to exactly answer the question you are asking, but it is fairly close in general:
>
>Well, for example it seems clear to me that the ISA of a pipelined CPU should have
>two flags registers. One for doing computations which are feeding results into branch
>instructions and the other one for doing computations unrelated to branching. Take this code for example:
>
>
>for(int i=0; i<1000; i++)
>a += array[i]
>
>
>The corresponding x86 code (one flags-register) is:
>
>
>EAX := array
>EBX := a
>
>MOV ECX,0
>NEXT:
>ADD EBX,[EAX+ECX]
>INC ECX
>CMP ECX,1000
>JMP_smaller NEXT
>
>
>Notice that you cannot reorder the ADD and CMP because it works with the-same-and-only-one
>FLAGS register. (If you force the reordering and save the FLAGS into memory, it will do more harm than good.)
>
>If you have two flag registers (F1,F2):
>
>
>MOV ECX,0
>NEXT:
>INC[F1] ECX
>CMP[F1] ECX,1000
>ADD[F2] EBX,[EAX+ECX]
>JMP_smaller[F1] NEXT
>
>
>The distinction is that in the latter case CMP happens one cycle sooner than in
>the former case. This may not sound like much, but in a pipelined CPU it means that
>the jump's target will be resolved (as in: known for a fact) one cycle sooner.
>
>Intel's Core2 with CMP+JMP fusion? That seems like a bad joke. Well, it speeds
>up x86 code that was compiled for the model with only one FLAGS register, no doubts
>about that. But as can be seen in the latter asm code, CMP+JMP fusion does not make
>much sense once you have multiple FLAGS registers.
>
>What about AMD's Bulldozer? It seems they are going to just blindly copy the CMP+JMP
>fusion concept (to speed up existing codes). But are they going to e.g. introduce
>a new instruction prefix for selecting the FLAGS register, bring some innovation into this field and beat Intel? Nooo.
>
>When Intel was designing the 386, they most likely already knew it is going to
>be a pipelined CPU. They could have put two FLAGS registers in there. Call me a
>skeptic, but I think the reason why the ISA has only one such register was that
>the ISA designers had no idea what they were doing (at least in this particular case).
>
>One of the main consequences is that having two flags registers lowers the pressure
>put on the branch prediction engine. There would even exist cases in which you can
>keep the pipeline fully utilized in the presence of conditional branch instructions
>in the code - without resorting to any speculations about the most likely branch targets ...
>
Well, PPC already does that, and it does help code in some cases, but it doesn't seem like a major win, and no one is really rushing to copy that. Of course the RISC approach of not having a flags register at all, is another solution.
Of more general use than a duplicate flags register would be more instructions that don't set the flags. And within the existing scope of x86, you can do that yourself (by replacing the add with a mov/lea sequence), or minimize the effect by unrolling. Of course those techniques might not apply to any particular case, but they do to this one, which goes to show that short code snippets are not worth too much when it comes to making architectural decisions.
Topic | Posted By | Date |
---|---|---|
A bit off base | John Mann | 2010/07/07 07:04 AM |
A bit off base | David Kanter | 2010/07/07 11:28 AM |
SSE vs x87 | Joel Hruska | 2010/07/07 12:53 PM |
SSE vs x87 | Michael S | 2010/07/07 01:07 PM |
SSE vs x87 | hobold | 2010/07/08 05:12 AM |
SSE vs x87 | David Kanter | 2010/07/07 02:55 PM |
SSE vs x87 | Andi Kleen | 2010/07/08 02:43 AM |
80 bit FP | Ricardo B | 2010/07/08 07:35 AM |
80 bit FP | David Kanter | 2010/07/08 11:14 AM |
80 bit FP | Kevin G | 2010/07/08 02:12 PM |
80 bit FP | Ian Ollmann | 2010/07/19 12:49 AM |
80 bit FP | David Kanter | 2010/07/19 11:33 AM |
80 bit FP | Anil Maliyekkel | 2010/07/19 04:49 PM |
80 bit FP | rwessel | 2010/07/19 05:41 PM |
80 bit FP | Matt Waldhauer | 2010/07/21 11:11 AM |
80 bit FP | Emil Briggs | 2010/07/22 09:06 AM |
A bit off base | John Mann | 2010/07/08 11:06 AM |
A bit off base | David Kanter | 2010/07/08 11:27 AM |
A bit off base | Ian Ameline | 2010/07/09 10:10 AM |
A bit off base | Michael S | 2010/07/10 02:13 PM |
A bit off base | Ian Ameline | 2010/07/11 07:51 AM |
A bit off base | David Kanter | 2010/07/07 09:46 PM |
A bit off base | Anon | 2010/07/08 12:47 AM |
A bit off base | anon | 2010/07/08 02:15 AM |
A bit off base | Gabriele Svelto | 2010/07/08 04:11 AM |
Physics engine history | Peter Clare | 2010/07/08 04:49 AM |
Physics engine history | Null Pointer Exception | 2010/07/08 06:07 AM |
Physics engine history | Ralf | 2010/07/08 03:09 PM |
Physics engine history | David Kanter | 2010/07/08 04:16 PM |
Physics engine history | sJ | 2010/07/08 11:36 PM |
Physics engine history | Gabriele Svelto | 2010/07/09 12:59 AM |
Physics engine history | sJ | 2010/07/13 06:35 AM |
Physics engine history | David Kanter | 2010/07/09 09:25 AM |
Physics engine history | sJ | 2010/07/13 06:49 AM |
Physics engine history | fvdbergh | 2010/07/13 07:27 AM |
A bit off base | John Mann | 2010/07/08 11:11 AM |
A bit off base | David Kanter | 2010/07/08 11:31 AM |
150 GFLOP/s measured? | anon | 2010/07/08 07:10 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/08 07:53 PM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/08 09:05 PM |
150 GFLOP/s measured? | anon | 2010/07/08 09:31 PM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/08 10:43 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/08 11:27 PM |
150 GFLOP/s measured? | Ian Ollmann | 2010/07/19 01:14 AM |
150 GFLOP/s measured? | anon | 2010/07/19 06:39 AM |
150 GFLOP/s measured? | hobold | 2010/07/19 07:26 AM |
Philosophy for achieving peak | David Kanter | 2010/07/19 11:49 AM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 07:36 AM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 08:42 AM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/19 08:56 AM |
150 GFLOP/s measured? | hobold | 2010/07/19 09:30 AM |
150 GFLOP/s measured? | Groo | 2010/07/19 02:31 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 04:17 PM |
150 GFLOP/s measured? | Groo | 2010/07/19 06:18 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 06:18 PM |
150 GFLOP/s measured? | Mark Roulo | 2010/07/19 11:47 AM |
150 GFLOP/s measured? | slacker | 2010/07/19 12:55 PM |
150 GFLOP/s measured? | Mark Roulo | 2010/07/19 01:00 PM |
150 GFLOP/s measured? | anonymous42 | 2010/07/25 12:31 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 12:41 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 02:57 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 04:10 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 04:10 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 04:25 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 04:31 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/20 06:04 AM |
150 GFLOP/s measured? | jrl | 2010/07/20 01:18 AM |
150 GFLOP/s measured? | anonymous42 | 2010/07/25 12:00 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/25 12:52 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 06:15 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 07:27 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 09:54 PM |
150 GFLOP/s measured? | anon | 2010/07/19 11:45 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 09:14 AM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 11:56 AM |
150 GFLOP/s measured? | a reader | 2010/07/21 08:16 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/21 09:05 PM |
150 GFLOP/s measured? | anon | 2010/07/22 02:09 AM |
150 GFLOP/s measured? | a reader | 2010/07/22 07:53 PM |
150 GFLOP/s measured? | gallier2 | 2010/07/23 05:58 AM |
150 GFLOP/s measured? | a reader | 2010/07/25 08:35 AM |
150 GFLOP/s measured? | David Kanter | 2010/07/25 11:49 AM |
150 GFLOP/s measured? | a reader | 2010/07/26 07:03 PM |
150 GFLOP/s measured? | Michael S | 2010/07/28 01:38 AM |
150 GFLOP/s measured? | Gabriele Svelto | 2010/07/28 01:44 AM |
150 GFLOP/s measured? | anon | 2010/07/23 04:55 PM |
150 GFLOP/s measured? | slacker | 2010/07/24 12:48 AM |
150 GFLOP/s measured? | anon | 2010/07/24 02:36 AM |
150 GFLOP/s measured? | Vincent Diepeveen | 2010/07/27 05:37 PM |
150 GFLOP/s measured? | ? | 2010/07/27 11:42 PM |
150 GFLOP/s measured? | slacker | 2010/07/28 05:55 AM |
Intel's clock rate projections | AM | 2010/07/28 02:03 AM |
nostalgia ain't what it used to be | someone | 2010/07/28 05:38 AM |
Intel's clock rate projections | AM | 2010/07/28 10:12 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/20 07:19 AM |
Separate the OoO-ness from speculative-ness | Mark Christiansen | 2010/07/20 02:26 PM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/20 06:04 PM |
Separate the OoO-ness from speculative-ness | Matt Sayler | 2010/07/20 06:10 PM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/20 09:37 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/20 11:51 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/21 02:16 AM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/21 07:05 AM |
Software conventions | Paul A. Clayton | 2010/07/21 08:52 AM |
Software conventions | ? | 2010/07/22 05:43 AM |
Speculation | David Kanter | 2010/07/21 10:32 AM |
Pipelining affects the ISA | ? | 2010/07/22 10:58 PM |
Pipelining affects the ISA | ? | 2010/07/22 11:14 PM |
Pipelining affects the ISA | rwessel | 2010/07/23 12:03 AM |
Pipelining affects the ISA | ? | 2010/07/23 05:50 AM |
Pipelining affects the ISA | ? | 2010/07/23 06:10 AM |
Pipelining affects the ISA | Thiago Kurovski | 2010/07/23 02:59 PM |
Pipelining affects the ISA | anon | 2010/07/24 07:35 AM |
Pipelining affects the ISA | Thiago Kurovski | 2010/07/24 11:12 AM |
Pipelining affects the ISA | Gabriele Svelto | 2010/07/26 02:50 AM |
Pipelining affects the ISA | IlleglWpns | 2010/07/26 05:14 AM |
Pipelining affects the ISA | Michael S | 2010/07/26 03:33 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/21 05:53 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/22 04:15 AM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/22 04:27 AM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/21 07:45 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/22 01:57 AM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/22 05:26 AM |
Separate the OoO-ness from speculative-ness | Dan Downs | 2010/07/22 08:14 AM |
Confusing and not very useful definition | David Kanter | 2010/07/22 12:41 PM |
Confusing and not very useful definition | ? | 2010/07/22 10:58 PM |
Confusing and not very useful definition | Ungo | 2010/07/24 12:06 PM |
Confusing and not very useful definition | ? | 2010/07/25 10:23 PM |
Separate the OoO-ness from speculative-ness | someone | 2010/07/20 08:02 PM |
Separate the OoO-ness from speculative-ness | Thiago Kurovski | 2010/07/21 04:13 PM |
You are just quoting SINGLE precision flops? OMG what planet do you live? | Vincent Diepeveen | 2010/07/19 10:26 AM |
The prior poster was talking about SP (NT) | David Kanter | 2010/07/19 11:34 AM |
All FFT's need double precision | Vincent Diepeveen | 2010/07/19 02:02 PM |
All FFT's need double precision | David Kanter | 2010/07/19 02:09 PM |
All FFT's need double precision | Vincent Diepeveen | 2010/07/19 04:06 PM |
All FFT's need double precision - not | Michael S | 2010/07/20 01:16 AM |
All FFT's need double precision - not | Ungo | 2010/07/21 12:04 AM |
All FFT's need double precision - not | Michael S | 2010/07/21 02:35 PM |
All FFT's need double precision - not | EduardoS | 2010/07/21 02:52 PM |
All FFT's need double precision - not | Anon | 2010/07/21 05:23 PM |
All FFT's need double precision - not | Ricardo B | 2010/07/26 07:46 AM |
I'm on a boat! | anon | 2010/07/22 11:42 AM |
All FFT's need double precision - not | Vincent Diepeveen | 2010/07/24 11:39 PM |
All FFT's need double precision - not | slacker | 2010/07/25 03:27 AM |
All FFT's need double precision - not | Ricardo B | 2010/07/26 07:40 AM |
All FFT's need double precision - not | EduardoS | 2010/07/25 08:37 AM |
All FFT's need double precision - not | Michael S | 2010/07/25 10:43 AM |
All FFT's need double precision - not | Vincent Diepeveen | 2010/07/24 11:19 PM |
A bit off base | EduardoS | 2010/07/08 04:08 PM |
A bit off base | Groo | 2010/07/08 06:11 PM |
A bit off base | john mann | 2010/07/08 06:58 PM |
All right...let's cool it... | David Kanter | 2010/07/08 07:54 PM |
A bit off base | Vincent Diepeveen | 2010/07/19 03:36 PM |