By: Michael S (already5chosen.delete@this.yahoo.com), August 21, 2013 5:15 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on August 20, 2013 6:07 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on August 20, 2013 10:58 am wrote:
> > anon (anon.delete@this.anon.com) on August 20, 2013 1:58 am wrote:
> > >
> > > I always wondered why AMD did try this with their 64-bit move. I can see Intel of that era wanting
> > > to keep cost and complexity of implementing x86 high, but it made less sense for AMD.
> >
> > No it didn't.
> >
> > The reason x86-64 was so successful was *exactly* that AMD did things right, and legacy x86 wasn't some
> > separate thing, but is very much baked into x86-64. There is no "either or": x86-64 was designed pretty
> > much from the ground up to be an extension, not a "separate mode". It's pretty much seamless.
>
> I did not say to drop legacy support, or leave it to some microcode or slowpath,
> I suggested to make a simpler and saner encoding for 64-bit mode.
>
> 32-bit mode absolutely would have still required first class support. Although there would have been a distant
> window into the future where it could have been slowly deprecated to slow paths. With x32 gaining ground, and
> enterprise software beginning to hit support limits, that window might not have been too far off today.
>
> (Ignore the stupid shit Intel is doing, like still releasing 32-bit only processors).
>
>
> > Yes, yes, "long mode" is a new mode bit, but at the same time you can see how it's
> > really using the same instruction decode logic, the same execution units, etc etc.
> > It's not two different front-ends, it's clearly one unified architecture.
>
> I did not suggest to try duplicating execution units, or anything else but decoder.
What are you going to gain by duplicating decoders?
To me it sounds like pure loss - more gates for nothing.
> And
> it would not be duplicating the decoder, so much as just another branch into a different
> encoding space (which is nothing unusual for an x86 and its modes and prefixes).
>
> Just like ARM is doing.
>
> > And I really think that the whole "seamless integration" was the right thing to do. It
>
> I just don't get that, though. It wasn't seamless -- 32-bit code can not run 64-bit mode, and vice versa. It
> is a different ISA, which just happens to have some of the encoding spaces map to equivalent instructions.
>
> > was never even a whiff of the ia64 "legacy engine" kind of engineering tradeoff where one
>
> That's just a false dichotomy. There wouldn't be vast duplications, or completely separate engines. Look at how
> anybody sane does it: it would be a sub-region of the encoding space in the decoder, and sharing back end.
>
> No, the new ISA could not be implemented in complete isolation of x86. I don't suggest that either. For
> example, suppose that 3 operand instructions are superior, but the 32-bit OOOE engine would have to be compromised
> to support that. Then the 64-bit ISA might have to compromise and use 2 operand instructions.
>
> I'm not suggesting they should have sent another design team out to make a completely new processor,
> then somehow mash in a K8 into one corner of it, or anything retarded like that.
>
> What I wondered, is why they reused the horrible encoding. Please do not take
> this as an opportunity to defend x86 encoding
But i386 choice of addressing modes and their encoding is objectively quite good.
> (I also did not specify they
> would have to use a fixed length encoding, or a larger sized encoding).
>
So what is you suggestion?
"Nicer" encoding with the same code density? As long as both legacy and new decoders in your chip are 3-instruction wide, it makes absolutely no engineering sense.
Make legacy decoders 2-wide? How much would it cost them in IA-32 performance? Something between 10% and 20%? That's way too much for an underdog in very competitive market. Besides, I'd guess that 2-legacy+3-new structure would still be bigger than what the y did.
Make new decoders 4-wide? Without major rework of the back end it will add very little to performance in the "new" mode. Now multiply the smallness of improvement by relatively lower importance of the "new" mode in total figure of merit of the processor, and 4-way "new" decoding with the same back end does not look like a winning proposition :(
So, the only selling point for improved decoding that does not look to me as hopeless is improved code density. But that is hard problem, especially if, at the same time, you want to keep the encoding much simpler, than the legacy.
I mean, assuming minimal or no changes to instruction semantics, improving AMD64 density by 5% is not that hard, but not worth the trouble. Improving by 20% sounds very hard, if at all possible. Except, of course, FP/SIMD part.
On FP/SIMD side I agree that significantly denser encoding was certainly possible. Also, on that side, non-destructive OPs are relatively bigger gain than on GPR side, so, may be, it was an opportunity to improve semantics together with encoding.
So, yes, they missed an opportunity to drop SSE1/2 in 64-bit mode and repalce it by the thing similar to scalar/128-bit AVX, but, due to reuse of SSE copcode space, more dense.
But in the grand scheme of things, code density, and even semantics, of FP/SIMD is not such a big deal.
> I wonder what Intel's 64-bit x86 ISA looked like.
>
> > It was also the right thing to do from a software perspective, because it meant that any traditional
> > x86 knowledge really translated pretty much directly into knowing the new mode. Sure, there are differences
> > (new registers, natch) and things that are no longer relevant
> > in 64-bit mode (the segments kind of survived,
> > but in a much weaker form, and vm86 mode is gone). But it's still overwhelmingly familiar.
>
> The instruction semantics actually could have been largely the same, and
> obviously not many people write code using a hex editor these days.
>
> > I'd say that anybody who claims that x86-64 could have been done better is just completely clueless.
> > Because it is about *so* much more than just introducing a new and improved execution mode.
>
> Don't get me wrong, I don't presume to know better than the stupidest engineer
> at AMD at the time. I don't doubt they thought carefully about it, and I don't
> doubt there *are* good reasons. They just haven't seemed very convincing to me.
>
> My line of questioning is not rhetorical or trying to challenge that I
> could easily do better. It is just my train of thought and questions.
>
Then try to propose something concrete. Otherwise it's not even a particularly skillful hand waving.
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on August 20, 2013 10:58 am wrote:
> > anon (anon.delete@this.anon.com) on August 20, 2013 1:58 am wrote:
> > >
> > > I always wondered why AMD did try this with their 64-bit move. I can see Intel of that era wanting
> > > to keep cost and complexity of implementing x86 high, but it made less sense for AMD.
> >
> > No it didn't.
> >
> > The reason x86-64 was so successful was *exactly* that AMD did things right, and legacy x86 wasn't some
> > separate thing, but is very much baked into x86-64. There is no "either or": x86-64 was designed pretty
> > much from the ground up to be an extension, not a "separate mode". It's pretty much seamless.
>
> I did not say to drop legacy support, or leave it to some microcode or slowpath,
> I suggested to make a simpler and saner encoding for 64-bit mode.
>
> 32-bit mode absolutely would have still required first class support. Although there would have been a distant
> window into the future where it could have been slowly deprecated to slow paths. With x32 gaining ground, and
> enterprise software beginning to hit support limits, that window might not have been too far off today.
>
> (Ignore the stupid shit Intel is doing, like still releasing 32-bit only processors).
>
>
> > Yes, yes, "long mode" is a new mode bit, but at the same time you can see how it's
> > really using the same instruction decode logic, the same execution units, etc etc.
> > It's not two different front-ends, it's clearly one unified architecture.
>
> I did not suggest to try duplicating execution units, or anything else but decoder.
What are you going to gain by duplicating decoders?
To me it sounds like pure loss - more gates for nothing.
> And
> it would not be duplicating the decoder, so much as just another branch into a different
> encoding space (which is nothing unusual for an x86 and its modes and prefixes).
>
> Just like ARM is doing.
>
> > And I really think that the whole "seamless integration" was the right thing to do. It
>
> I just don't get that, though. It wasn't seamless -- 32-bit code can not run 64-bit mode, and vice versa. It
> is a different ISA, which just happens to have some of the encoding spaces map to equivalent instructions.
>
> > was never even a whiff of the ia64 "legacy engine" kind of engineering tradeoff where one
>
> That's just a false dichotomy. There wouldn't be vast duplications, or completely separate engines. Look at how
> anybody sane does it: it would be a sub-region of the encoding space in the decoder, and sharing back end.
>
> No, the new ISA could not be implemented in complete isolation of x86. I don't suggest that either. For
> example, suppose that 3 operand instructions are superior, but the 32-bit OOOE engine would have to be compromised
> to support that. Then the 64-bit ISA might have to compromise and use 2 operand instructions.
>
> I'm not suggesting they should have sent another design team out to make a completely new processor,
> then somehow mash in a K8 into one corner of it, or anything retarded like that.
>
> What I wondered, is why they reused the horrible encoding. Please do not take
> this as an opportunity to defend x86 encoding
But i386 choice of addressing modes and their encoding is objectively quite good.
> (I also did not specify they
> would have to use a fixed length encoding, or a larger sized encoding).
>
So what is you suggestion?
"Nicer" encoding with the same code density? As long as both legacy and new decoders in your chip are 3-instruction wide, it makes absolutely no engineering sense.
Make legacy decoders 2-wide? How much would it cost them in IA-32 performance? Something between 10% and 20%? That's way too much for an underdog in very competitive market. Besides, I'd guess that 2-legacy+3-new structure would still be bigger than what the y did.
Make new decoders 4-wide? Without major rework of the back end it will add very little to performance in the "new" mode. Now multiply the smallness of improvement by relatively lower importance of the "new" mode in total figure of merit of the processor, and 4-way "new" decoding with the same back end does not look like a winning proposition :(
So, the only selling point for improved decoding that does not look to me as hopeless is improved code density. But that is hard problem, especially if, at the same time, you want to keep the encoding much simpler, than the legacy.
I mean, assuming minimal or no changes to instruction semantics, improving AMD64 density by 5% is not that hard, but not worth the trouble. Improving by 20% sounds very hard, if at all possible. Except, of course, FP/SIMD part.
On FP/SIMD side I agree that significantly denser encoding was certainly possible. Also, on that side, non-destructive OPs are relatively bigger gain than on GPR side, so, may be, it was an opportunity to improve semantics together with encoding.
So, yes, they missed an opportunity to drop SSE1/2 in 64-bit mode and repalce it by the thing similar to scalar/128-bit AVX, but, due to reuse of SSE copcode space, more dense.
But in the grand scheme of things, code density, and even semantics, of FP/SIMD is not such a big deal.
> I wonder what Intel's 64-bit x86 ISA looked like.
>
> > It was also the right thing to do from a software perspective, because it meant that any traditional
> > x86 knowledge really translated pretty much directly into knowing the new mode. Sure, there are differences
> > (new registers, natch) and things that are no longer relevant
> > in 64-bit mode (the segments kind of survived,
> > but in a much weaker form, and vm86 mode is gone). But it's still overwhelmingly familiar.
>
> The instruction semantics actually could have been largely the same, and
> obviously not many people write code using a hex editor these days.
>
> > I'd say that anybody who claims that x86-64 could have been done better is just completely clueless.
> > Because it is about *so* much more than just introducing a new and improved execution mode.
>
> Don't get me wrong, I don't presume to know better than the stupidest engineer
> at AMD at the time. I don't doubt they thought carefully about it, and I don't
> doubt there *are* good reasons. They just haven't seemed very convincing to me.
>
> My line of questioning is not rhetorical or trying to challenge that I
> could easily do better. It is just my train of thought and questions.
>
Then try to propose something concrete. Otherwise it's not even a particularly skillful hand waving.