By: Maynard Handley (name99.delete@this.name99.org), August 16, 2014 6:35 pm
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on August 16, 2014 5:43 pm wrote:
> There's no support for 286 mode. 286 protected mode, fortunately, did not carry on to the 80386.
>
So are you saying that I could not just run OS/2 on a modern Intel CPU? When did that become true? Your answer suggests that it was true even with the 386, but that surely can't be right. Didn't IBM have OS/2 running on the 386 based PS/2's?
> In it's use of destructive operations, requiring extra mov reg,reg operations (some
> times, lots of them), which take energy and execution resources on every x86 CPU.
> Again, that penalty can be greatly reduced by eliminating them at
> rename (Ivy Bridge onwards, next AMD high end cores too IIRC).
I'm surprised it took till IB to do this. I'd have thought it would have come in at Nehalem, if not sooner.
FWIW Cyclone also does this (and since there's a register that's kinda/sorta dedicated to being zero, it can also set to zero at rename. Obviously x86 has its preferred idiom for zeroing which is recognized by the decoder, but I don't know if it's also handled at rename.)
Intel have, to some extent, worked around the register problem with op fusion. I've mentioned that IBM have made use of the same idea (in a slightly different context) with POWER8. There was a very interesting thesis recently that discussed mini-graphs (a sort of generalized op fusion) in the context of a generic RISC ISA. The idea was to fuse together up to three successive instructions that fed a value from one op to the next, so that the entire pipeline was basically represented as a single instruction with say two inputs and one output, and the intermediate results as invisible temporaries. (Obvious sorts of pairs to fuse are the sorts of things that Intel already DOES fuse --- cmp+branch, or load+calculate or calculate+store)
The idea is cute and gives nice speedups (up to around 30% if you go down the whole path suggested which allows for the compiler to essentially define on the fly the set of minigraphs to fuse; but that scheme, while src code bwd and fwd compatible, requires more infrastructure in the core than I'd expect on the first iteration. In the Intel case, where a few special purpose pairs are handled, op fusion is worth about 10%.)
Basically it gives you effectively a few more physical registers and a few more ROB slots, which is nice, but if that's what you're after you're probably better off going down one of the many KIP variants and more directly attacking the problem of physical registers and ROB slots. I'm very curious to see what the various ARM vendors do in this respect once they've fully explored the basic set of tricks that Intel is currently using (which I expect will be in three or four years).
> There's no support for 286 mode. 286 protected mode, fortunately, did not carry on to the 80386.
>
So are you saying that I could not just run OS/2 on a modern Intel CPU? When did that become true? Your answer suggests that it was true even with the 386, but that surely can't be right. Didn't IBM have OS/2 running on the 386 based PS/2's?
> In it's use of destructive operations, requiring extra mov reg,reg operations (some
> times, lots of them), which take energy and execution resources on every x86 CPU.
> Again, that penalty can be greatly reduced by eliminating them at
> rename (Ivy Bridge onwards, next AMD high end cores too IIRC).
I'm surprised it took till IB to do this. I'd have thought it would have come in at Nehalem, if not sooner.
FWIW Cyclone also does this (and since there's a register that's kinda/sorta dedicated to being zero, it can also set to zero at rename. Obviously x86 has its preferred idiom for zeroing which is recognized by the decoder, but I don't know if it's also handled at rename.)
Intel have, to some extent, worked around the register problem with op fusion. I've mentioned that IBM have made use of the same idea (in a slightly different context) with POWER8. There was a very interesting thesis recently that discussed mini-graphs (a sort of generalized op fusion) in the context of a generic RISC ISA. The idea was to fuse together up to three successive instructions that fed a value from one op to the next, so that the entire pipeline was basically represented as a single instruction with say two inputs and one output, and the intermediate results as invisible temporaries. (Obvious sorts of pairs to fuse are the sorts of things that Intel already DOES fuse --- cmp+branch, or load+calculate or calculate+store)
The idea is cute and gives nice speedups (up to around 30% if you go down the whole path suggested which allows for the compiler to essentially define on the fly the set of minigraphs to fuse; but that scheme, while src code bwd and fwd compatible, requires more infrastructure in the core than I'd expect on the first iteration. In the Intel case, where a few special purpose pairs are handled, op fusion is worth about 10%.)
Basically it gives you effectively a few more physical registers and a few more ROB slots, which is nice, but if that's what you're after you're probably better off going down one of the many KIP variants and more directly attacking the problem of physical registers and ROB slots. I'm very curious to see what the various ARM vendors do in this respect once they've fully explored the basic set of tricks that Intel is currently using (which I expect will be in three or four years).