Article: PhysX87: Software Deficiency
By: David Kanter (dkanter.delete@this.realworldtech.com), July 21, 2010 10:32 am
Room: Moderated Discussions
>From a historical perspective, highly pipelined x86 CPUs >appeared after the non-pipelined
>ones.
> The idea you are advocating here is in contradiction >with this, because the
>instruction set of the older CPUs had (=past) no idea about pipelining. Of course,
>now (=present) that we can see "the bigger picture", anyone going to design a non-pipelined
>CPU will be wise to add some pipeline-management instructions to the instruction
>set, just in case if there will be a pipelined version. >But in case of old CPU designs
>and old codes, you cannot travel back in time to 1970-ties.
>
>>Exceptions are another one, but again it could have imprecise exceptions and not require speculative execution.
>
>Well, but you cannot go back to 1970 and fix 8086 so that >it knows about the concept of imprecise exceptions.
What pipeline management instructions would you want?
>>> Writes to registers are OK from this point of view, since the CPU
>>>never fetches an instruction from there. In CPUs which are able to do data speculation,
>>>even writing a register might cause partial pipeline stalls. (I don't know why I
>>>am writing this here, because it seems obvious.)
>>>
>>>If you think pipelining in a universal-computation CPU has nothing to do with speculation,
>>>you are simply wrong.
>>
>>On the contrary, I think your assertion that pipelining requires speculative execution is wrong.
>
>I am not saying that. I am saying this: Pipelining, the way it is implemented in
>current CPUs (e.g: Core2), involves certain assumptions about what patterns will
>there be when the code is actually executed. Those assumptions are made without
>any prior attempts to check whether the code actually matches those patterns. The
>conversation between the CPU and the code looks like:
>
>Code: Hi there, CPU, my friend. I want you to execute me.
>
>CPU: No problem. Hand me the first couple of your instructions.
>
>Code: Only a couple of them? You mean you don't want to see all of them?
>
>CPU: That's right.
>
>Code: But but, then, you aren't going to know what I am going to do.
>
>CPU: Don't worry. I will manage.
>
>Code: How?
>
>CPU: Look pal, I know your kind. You will tell me to execute a bunch of moderately
>long sequences of adjacent instructions. Plus some branches here and there.
>
>Code: Maybe, but what if you are wrong about me? Your assumptions seem to me like
>some kind of speculation about my nature.
>
>CPU: Don't be so shy and hand me the initial couple of instructions!
>
>Code: What if I don't ...
>
>CPU: You don't have a choice ...
OK - that was pretty funny : )
I'd say the assumptions though are much looser (typically you assume no more than one branch/cycle and roughly 2-3 instructions per branch).
That being said, what would you think of as an alternative?
There's no way that the CPU can examine all the instruction in a program (and you probably wouldn't want to either, since many static instructions are not executed dynamically).
And even the program itself may not have a good idea of what is going on (e.g. dynamic linking to libraries).
>>> On the other hand, non-speculative pipelining *is* possible,
>>>but only if the CPU is able to mathematically prove that a particular piece of code
>>>is never violating any assumptions made by the pipelined architecture. But how many
>>>existing CPUs are able to do such proofs?
>>
>>Anecdotal evidence does not make your claim correct.
>>
>>>Similarly, L1/L2 caches without any traces of speculative-ness whatsoever are also
>>>possible - provided the CPU is able to actually prove that the memory access patterns
>>>in a particular piece of code are fully known in advance. But how many existing
>>>CPUs are able to do such proofs? (Considering the design of the x86 ISA, I cannot
>>say I blame them for this inability.)
>>
>>Caches have nothing to do with speculative execution that you were talking about.
>
>What are you saying? That if a contemporary CPU (Core2 or >whatever) decides to
>allocate a cache-line for data at address 0x1230, it is >not making any speculations
>about future uses of that piece of data?
>
>(Note: I am *not* against caches)
I agree that caches definitely are speculative optimizations. Rather helpful ones.
David
>ones.
> The idea you are advocating here is in contradiction >with this, because the
>instruction set of the older CPUs had (=past) no idea about pipelining. Of course,
>now (=present) that we can see "the bigger picture", anyone going to design a non-pipelined
>CPU will be wise to add some pipeline-management instructions to the instruction
>set, just in case if there will be a pipelined version. >But in case of old CPU designs
>and old codes, you cannot travel back in time to 1970-ties.
>
>>Exceptions are another one, but again it could have imprecise exceptions and not require speculative execution.
>
>Well, but you cannot go back to 1970 and fix 8086 so that >it knows about the concept of imprecise exceptions.
What pipeline management instructions would you want?
>>> Writes to registers are OK from this point of view, since the CPU
>>>never fetches an instruction from there. In CPUs which are able to do data speculation,
>>>even writing a register might cause partial pipeline stalls. (I don't know why I
>>>am writing this here, because it seems obvious.)
>>>
>>>If you think pipelining in a universal-computation CPU has nothing to do with speculation,
>>>you are simply wrong.
>>
>>On the contrary, I think your assertion that pipelining requires speculative execution is wrong.
>
>I am not saying that. I am saying this: Pipelining, the way it is implemented in
>current CPUs (e.g: Core2), involves certain assumptions about what patterns will
>there be when the code is actually executed. Those assumptions are made without
>any prior attempts to check whether the code actually matches those patterns. The
>conversation between the CPU and the code looks like:
>
>Code: Hi there, CPU, my friend. I want you to execute me.
>
>CPU: No problem. Hand me the first couple of your instructions.
>
>Code: Only a couple of them? You mean you don't want to see all of them?
>
>CPU: That's right.
>
>Code: But but, then, you aren't going to know what I am going to do.
>
>CPU: Don't worry. I will manage.
>
>Code: How?
>
>CPU: Look pal, I know your kind. You will tell me to execute a bunch of moderately
>long sequences of adjacent instructions. Plus some branches here and there.
>
>Code: Maybe, but what if you are wrong about me? Your assumptions seem to me like
>some kind of speculation about my nature.
>
>CPU: Don't be so shy and hand me the initial couple of instructions!
>
>Code: What if I don't ...
>
>CPU: You don't have a choice ...
OK - that was pretty funny : )
I'd say the assumptions though are much looser (typically you assume no more than one branch/cycle and roughly 2-3 instructions per branch).
That being said, what would you think of as an alternative?
There's no way that the CPU can examine all the instruction in a program (and you probably wouldn't want to either, since many static instructions are not executed dynamically).
And even the program itself may not have a good idea of what is going on (e.g. dynamic linking to libraries).
>>> On the other hand, non-speculative pipelining *is* possible,
>>>but only if the CPU is able to mathematically prove that a particular piece of code
>>>is never violating any assumptions made by the pipelined architecture. But how many
>>>existing CPUs are able to do such proofs?
>>
>>Anecdotal evidence does not make your claim correct.
>>
>>>Similarly, L1/L2 caches without any traces of speculative-ness whatsoever are also
>>>possible - provided the CPU is able to actually prove that the memory access patterns
>>>in a particular piece of code are fully known in advance. But how many existing
>>>CPUs are able to do such proofs? (Considering the design of the x86 ISA, I cannot
>>say I blame them for this inability.)
>>
>>Caches have nothing to do with speculative execution that you were talking about.
>
>What are you saying? That if a contemporary CPU (Core2 or >whatever) decides to
>allocate a cache-line for data at address 0x1230, it is >not making any speculations
>about future uses of that piece of data?
>
>(Note: I am *not* against caches)
I agree that caches definitely are speculative optimizations. Rather helpful ones.
David
Topic | Posted By | Date |
---|---|---|
A bit off base | John Mann | 2010/07/07 07:04 AM |
A bit off base | David Kanter | 2010/07/07 11:28 AM |
SSE vs x87 | Joel Hruska | 2010/07/07 12:53 PM |
SSE vs x87 | Michael S | 2010/07/07 01:07 PM |
SSE vs x87 | hobold | 2010/07/08 05:12 AM |
SSE vs x87 | David Kanter | 2010/07/07 02:55 PM |
SSE vs x87 | Andi Kleen | 2010/07/08 02:43 AM |
80 bit FP | Ricardo B | 2010/07/08 07:35 AM |
80 bit FP | David Kanter | 2010/07/08 11:14 AM |
80 bit FP | Kevin G | 2010/07/08 02:12 PM |
80 bit FP | Ian Ollmann | 2010/07/19 12:49 AM |
80 bit FP | David Kanter | 2010/07/19 11:33 AM |
80 bit FP | Anil Maliyekkel | 2010/07/19 04:49 PM |
80 bit FP | rwessel | 2010/07/19 05:41 PM |
80 bit FP | Matt Waldhauer | 2010/07/21 11:11 AM |
80 bit FP | Emil Briggs | 2010/07/22 09:06 AM |
A bit off base | John Mann | 2010/07/08 11:06 AM |
A bit off base | David Kanter | 2010/07/08 11:27 AM |
A bit off base | Ian Ameline | 2010/07/09 10:10 AM |
A bit off base | Michael S | 2010/07/10 02:13 PM |
A bit off base | Ian Ameline | 2010/07/11 07:51 AM |
A bit off base | David Kanter | 2010/07/07 09:46 PM |
A bit off base | Anon | 2010/07/08 12:47 AM |
A bit off base | anon | 2010/07/08 02:15 AM |
A bit off base | Gabriele Svelto | 2010/07/08 04:11 AM |
Physics engine history | Peter Clare | 2010/07/08 04:49 AM |
Physics engine history | Null Pointer Exception | 2010/07/08 06:07 AM |
Physics engine history | Ralf | 2010/07/08 03:09 PM |
Physics engine history | David Kanter | 2010/07/08 04:16 PM |
Physics engine history | sJ | 2010/07/08 11:36 PM |
Physics engine history | Gabriele Svelto | 2010/07/09 12:59 AM |
Physics engine history | sJ | 2010/07/13 06:35 AM |
Physics engine history | David Kanter | 2010/07/09 09:25 AM |
Physics engine history | sJ | 2010/07/13 06:49 AM |
Physics engine history | fvdbergh | 2010/07/13 07:27 AM |
A bit off base | John Mann | 2010/07/08 11:11 AM |
A bit off base | David Kanter | 2010/07/08 11:31 AM |
150 GFLOP/s measured? | anon | 2010/07/08 07:10 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/08 07:53 PM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/08 09:05 PM |
150 GFLOP/s measured? | anon | 2010/07/08 09:31 PM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/08 10:43 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/08 11:27 PM |
150 GFLOP/s measured? | Ian Ollmann | 2010/07/19 01:14 AM |
150 GFLOP/s measured? | anon | 2010/07/19 06:39 AM |
150 GFLOP/s measured? | hobold | 2010/07/19 07:26 AM |
Philosophy for achieving peak | David Kanter | 2010/07/19 11:49 AM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 07:36 AM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 08:42 AM |
150 GFLOP/s measured? | Aaron Spink | 2010/07/19 08:56 AM |
150 GFLOP/s measured? | hobold | 2010/07/19 09:30 AM |
150 GFLOP/s measured? | Groo | 2010/07/19 02:31 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 04:17 PM |
150 GFLOP/s measured? | Groo | 2010/07/19 06:18 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 06:18 PM |
150 GFLOP/s measured? | Mark Roulo | 2010/07/19 11:47 AM |
150 GFLOP/s measured? | slacker | 2010/07/19 12:55 PM |
150 GFLOP/s measured? | Mark Roulo | 2010/07/19 01:00 PM |
150 GFLOP/s measured? | anonymous42 | 2010/07/25 12:31 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 12:41 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 02:57 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 04:10 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/19 04:10 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 04:25 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 04:31 PM |
150 GFLOP/s measured? | Richard Cownie | 2010/07/20 06:04 AM |
150 GFLOP/s measured? | jrl | 2010/07/20 01:18 AM |
150 GFLOP/s measured? | anonymous42 | 2010/07/25 12:00 PM |
150 GFLOP/s measured? | David Kanter | 2010/07/25 12:52 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 06:15 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 07:27 PM |
150 GFLOP/s measured? | Anon | 2010/07/19 09:54 PM |
150 GFLOP/s measured? | anon | 2010/07/19 11:45 PM |
150 GFLOP/s measured? | hobold | 2010/07/19 09:14 AM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/19 11:56 AM |
150 GFLOP/s measured? | a reader | 2010/07/21 08:16 PM |
150 GFLOP/s measured? | Linus Torvalds | 2010/07/21 09:05 PM |
150 GFLOP/s measured? | anon | 2010/07/22 02:09 AM |
150 GFLOP/s measured? | a reader | 2010/07/22 07:53 PM |
150 GFLOP/s measured? | gallier2 | 2010/07/23 05:58 AM |
150 GFLOP/s measured? | a reader | 2010/07/25 08:35 AM |
150 GFLOP/s measured? | David Kanter | 2010/07/25 11:49 AM |
150 GFLOP/s measured? | a reader | 2010/07/26 07:03 PM |
150 GFLOP/s measured? | Michael S | 2010/07/28 01:38 AM |
150 GFLOP/s measured? | Gabriele Svelto | 2010/07/28 01:44 AM |
150 GFLOP/s measured? | anon | 2010/07/23 04:55 PM |
150 GFLOP/s measured? | slacker | 2010/07/24 12:48 AM |
150 GFLOP/s measured? | anon | 2010/07/24 02:36 AM |
150 GFLOP/s measured? | Vincent Diepeveen | 2010/07/27 05:37 PM |
150 GFLOP/s measured? | ? | 2010/07/27 11:42 PM |
150 GFLOP/s measured? | slacker | 2010/07/28 05:55 AM |
Intel's clock rate projections | AM | 2010/07/28 02:03 AM |
nostalgia ain't what it used to be | someone | 2010/07/28 05:38 AM |
Intel's clock rate projections | AM | 2010/07/28 10:12 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/20 07:19 AM |
Separate the OoO-ness from speculative-ness | Mark Christiansen | 2010/07/20 02:26 PM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/20 06:04 PM |
Separate the OoO-ness from speculative-ness | Matt Sayler | 2010/07/20 06:10 PM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/20 09:37 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/20 11:51 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/21 02:16 AM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/21 07:05 AM |
Software conventions | Paul A. Clayton | 2010/07/21 08:52 AM |
Software conventions | ? | 2010/07/22 05:43 AM |
Speculation | David Kanter | 2010/07/21 10:32 AM |
Pipelining affects the ISA | ? | 2010/07/22 10:58 PM |
Pipelining affects the ISA | ? | 2010/07/22 11:14 PM |
Pipelining affects the ISA | rwessel | 2010/07/23 12:03 AM |
Pipelining affects the ISA | ? | 2010/07/23 05:50 AM |
Pipelining affects the ISA | ? | 2010/07/23 06:10 AM |
Pipelining affects the ISA | Thiago Kurovski | 2010/07/23 02:59 PM |
Pipelining affects the ISA | anon | 2010/07/24 07:35 AM |
Pipelining affects the ISA | Thiago Kurovski | 2010/07/24 11:12 AM |
Pipelining affects the ISA | Gabriele Svelto | 2010/07/26 02:50 AM |
Pipelining affects the ISA | IlleglWpns | 2010/07/26 05:14 AM |
Pipelining affects the ISA | Michael S | 2010/07/26 03:33 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/21 05:53 PM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/22 04:15 AM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/22 04:27 AM |
Separate the OoO-ness from speculative-ness | slacker | 2010/07/21 07:45 PM |
Separate the OoO-ness from speculative-ness | anon | 2010/07/22 01:57 AM |
Separate the OoO-ness from speculative-ness | ? | 2010/07/22 05:26 AM |
Separate the OoO-ness from speculative-ness | Dan Downs | 2010/07/22 08:14 AM |
Confusing and not very useful definition | David Kanter | 2010/07/22 12:41 PM |
Confusing and not very useful definition | ? | 2010/07/22 10:58 PM |
Confusing and not very useful definition | Ungo | 2010/07/24 12:06 PM |
Confusing and not very useful definition | ? | 2010/07/25 10:23 PM |
Separate the OoO-ness from speculative-ness | someone | 2010/07/20 08:02 PM |
Separate the OoO-ness from speculative-ness | Thiago Kurovski | 2010/07/21 04:13 PM |
You are just quoting SINGLE precision flops? OMG what planet do you live? | Vincent Diepeveen | 2010/07/19 10:26 AM |
The prior poster was talking about SP (NT) | David Kanter | 2010/07/19 11:34 AM |
All FFT's need double precision | Vincent Diepeveen | 2010/07/19 02:02 PM |
All FFT's need double precision | David Kanter | 2010/07/19 02:09 PM |
All FFT's need double precision | Vincent Diepeveen | 2010/07/19 04:06 PM |
All FFT's need double precision - not | Michael S | 2010/07/20 01:16 AM |
All FFT's need double precision - not | Ungo | 2010/07/21 12:04 AM |
All FFT's need double precision - not | Michael S | 2010/07/21 02:35 PM |
All FFT's need double precision - not | EduardoS | 2010/07/21 02:52 PM |
All FFT's need double precision - not | Anon | 2010/07/21 05:23 PM |
All FFT's need double precision - not | Ricardo B | 2010/07/26 07:46 AM |
I'm on a boat! | anon | 2010/07/22 11:42 AM |
All FFT's need double precision - not | Vincent Diepeveen | 2010/07/24 11:39 PM |
All FFT's need double precision - not | slacker | 2010/07/25 03:27 AM |
All FFT's need double precision - not | Ricardo B | 2010/07/26 07:40 AM |
All FFT's need double precision - not | EduardoS | 2010/07/25 08:37 AM |
All FFT's need double precision - not | Michael S | 2010/07/25 10:43 AM |
All FFT's need double precision - not | Vincent Diepeveen | 2010/07/24 11:19 PM |
A bit off base | EduardoS | 2010/07/08 04:08 PM |
A bit off base | Groo | 2010/07/08 06:11 PM |
A bit off base | john mann | 2010/07/08 06:58 PM |
All right...let's cool it... | David Kanter | 2010/07/08 07:54 PM |
A bit off base | Vincent Diepeveen | 2010/07/19 03:36 PM |