rwessel (robertwessel@yahoo.com) on 7/23/10 wrote:
>? (0xe2.0x9a.0x9b@gmail.com) on 7/22/10 wrote:
>>David Kanter (dkanter@realworldtech.com) on 7/21/10 wrote:
>>>>Well, but you cannot go back to 1970 and fix 8086 so that >it knows about the concept of imprecise exceptions.
>>>What pipeline management instructions would you want?
>>Maybe this isn't going to exactly answer the question you are asking, but it is fairly close in general:
>>Well, for example it seems clear to me that the ISA of a pipelined CPU should have
>>two flags registers. One for doing computations which are feeding results into branch
>>instructions and the other one for doing computations unrelated to branching. Take this code for example:

>>for(int i=0; i<1000; i++)
>>a += array[i]

>>The corresponding x86 code (one flags-register) is:

>>EAX := array
>>EBX := a
>>CMP ECX,1000
>>JMP_smaller NEXT

>>Notice that you cannot reorder the ADD and CMP because it works with the-same-and-only-one
>>FLAGS register. (If you force the reordering and save the FLAGS into memory, it will do more harm than good.)
>>If you have two flag registers (F1,F2):

>>CMP[F1] ECX,1000
>>JMP_smaller[F1] NEXT

>>The distinction is that in the latter case CMP happens one cycle sooner than in
>>the former case. This may not sound like much, but in a pipelined CPU it means that
>>the jump's target will be resolved (as in: known for a fact) one cycle sooner.
>>Intel's Core2 with CMP+JMP fusion? That seems like a bad joke. Well, it speeds
>>up x86 code that was compiled for the model with only one FLAGS register, no doubts
>>about that. But as can be seen in the latter asm code, CMP+JMP fusion does not make
>>much sense once you have multiple FLAGS registers.
>>What about AMD's Bulldozer? It seems they are going to just blindly copy the CMP+JMP
>>fusion concept (to speed up existing codes). But are they going to e.g. introduce
>>a new instruction prefix for selecting the FLAGS register, bring some innovation into this field and beat Intel? Nooo.
>>When Intel was designing the 386, they most likely already knew it is going to
>>be a pipelined CPU. They could have put two FLAGS registers in there. Call me a
>>skeptic, but I think the reason why the ISA has only one such register was that
>>the ISA designers had no idea what they were doing (at least in this particular case).
>>One of the main consequences is that having two flags registers lowers the pressure
>>put on the branch prediction engine. There would even exist cases in which you can
>>keep the pipeline fully utilized in the presence of conditional branch instructions
>>in the code - without resorting to any speculations about the most likely branch targets ...
>Well, PPC already does that, and it does help code in some cases, but it doesn't
>seem like a major win, and no one is really rushing to copy that. Of course the
>RISC approach of not having a flags register at all, is another solution.
>Of more general use than a duplicate flags register would be more instructions
>that don't set the flags. And within the existing scope of x86, you can do that
>yourself (by replacing the add with a mov/lea sequence), or minimize the effect
>by unrolling. Of course those techniques might not apply to any particular case,
>but they do to this one, which goes to show that short code snippets are not worth
>too much when it comes to making architectural decisions.
