Article: AMD's Mobile Strategy
By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 18, 2011 1:49 pm
Room: Moderated Discussions
Ricardo B (ricardo.b@xxxxx.xx) on 12/18/11 wrote:
>
>There's a difference between allowing 0 or N prefixes.
Sure. There's a difference. But on the whole, it's an
engineering issue, and it's also rather complicated to
handle the case where you have "more prefixes than you
can easily handle, so now we have yet another special
case".
So it may well be simpler to just handle any arbitrary
14 prefixes by throwing duplicate hardware at it, than it
would be to handle the common case with simpler hardware
and then having some fallback logic for the "many prefix"
case.
You commonly have two or three prefixes, so you
have to handle that reasonably well anyway.
>Actually, Intel resorts to microcode the nasty cases.
Not in the decoding itself.
When intel resorts to microcode it's not because the
decode is complicated, it's because the semantics
of the instruction itself are complex.
The repeat string instructions would be the obvious
example of this: they are trivial to decode (just a single
byte, no arguments, no immediate, no nothing. Yes, they
obviously do have the prefixes) but they basically
implement something close to "memcpy".
So the decode itself may be trivial, but feeding the result
to the execution engine may then involve lots of uops, and
Intel has traditionally limited only one of their decoders
to that kind of "uop rom lookup" (or even the "more than one
uop" case, never mind the case where you have to look up
the uops separately).
Linus
>
>There's a difference between allowing 0 or N prefixes.
Sure. There's a difference. But on the whole, it's an
engineering issue, and it's also rather complicated to
handle the case where you have "more prefixes than you
can easily handle, so now we have yet another special
case".
So it may well be simpler to just handle any arbitrary
14 prefixes by throwing duplicate hardware at it, than it
would be to handle the common case with simpler hardware
and then having some fallback logic for the "many prefix"
case.
You commonly have two or three prefixes, so you
have to handle that reasonably well anyway.
>Actually, Intel resorts to microcode the nasty cases.
Not in the decoding itself.
When intel resorts to microcode it's not because the
decode is complicated, it's because the semantics
of the instruction itself are complex.
The repeat string instructions would be the obvious
example of this: they are trivial to decode (just a single
byte, no arguments, no immediate, no nothing. Yes, they
obviously do have the prefixes) but they basically
implement something close to "memcpy".
So the decode itself may be trivial, but feeding the result
to the execution engine may then involve lots of uops, and
Intel has traditionally limited only one of their decoders
to that kind of "uop rom lookup" (or even the "more than one
uop" case, never mind the case where you have to look up
the uops separately).
Linus