Article: AMD's Mobile Strategy
By: Ricardo B (ricardo.b.delete@this.xxxxx.xx), December 17, 2011 1:21 pm
Room: Moderated Discussions
Linus Torvalds (torvalds@linux-foundation.org) on 12/16/11 wrote:
>is "odd" - but it's odd for humans, not so much for some
>digital logic. It's the usual kind of "wires to move bits
>around, with certain bits meaning certain things".
>
>So scalar x86 decoding really isn't that hard. You would
>definitely not need to bother with predecode for that. Just
>look at the 8086 - it was 30k transistors total, and
>while it did everything serially (ie one prefix byte a
>cycle), you still have to realize how little that is. The
>basic decoding really isn't hard.
Yeah but there's an exponential increase in complexity from the 8086 and SandyBridge.
As you move from decoding one instruction in multiple cycles to decoding one instruction per cycle to decoding multiple instructions per cycle, things get complicated because the number of possibilities explodes.
Those "wires" aren't actually wires, they're muxes.
IIRC, a x86 instruction can have 0 to 4 prefix bytes and 1 to 16 bytes total.
In a straight forward approach, that's 12 x 8 x 5:1 muxes, plus the logic that decodes the prefix bytes just to get the opcodes + etc in the right place.
Then you have a 1-3 byte opcode, which means another array of 3:1 muxes to get the operands and etc into the right place.
So on...
On top of that, a 2nd decoder would need an array of 16 x 8 x 16:1 muxes (and signals from the 1st decoder) just to get it's instruction right.
The 3rd decoder? Something like an array of 16 x 8 x 32:1 muxes.
(that's for a pessimistic case in which you want to be able to decode 3 instructions per cycle in all cases).
But an even bigger source of exponential complexity is speed.
The approach is described above is fairly cheap in terms of transistors but the logic depth (delay) of the various stages adds up quickly.
To operate at higher frequencies, one wants a more parallel/speculative approach that reduces logic depth but greatly increases total transistor count (and area and power).
>is "odd" - but it's odd for humans, not so much for some
>digital logic. It's the usual kind of "wires to move bits
>around, with certain bits meaning certain things".
>
>So scalar x86 decoding really isn't that hard. You would
>definitely not need to bother with predecode for that. Just
>look at the 8086 - it was 30k transistors total, and
>while it did everything serially (ie one prefix byte a
>cycle), you still have to realize how little that is. The
>basic decoding really isn't hard.
Yeah but there's an exponential increase in complexity from the 8086 and SandyBridge.
As you move from decoding one instruction in multiple cycles to decoding one instruction per cycle to decoding multiple instructions per cycle, things get complicated because the number of possibilities explodes.
Those "wires" aren't actually wires, they're muxes.
IIRC, a x86 instruction can have 0 to 4 prefix bytes and 1 to 16 bytes total.
In a straight forward approach, that's 12 x 8 x 5:1 muxes, plus the logic that decodes the prefix bytes just to get the opcodes + etc in the right place.
Then you have a 1-3 byte opcode, which means another array of 3:1 muxes to get the operands and etc into the right place.
So on...
On top of that, a 2nd decoder would need an array of 16 x 8 x 16:1 muxes (and signals from the 1st decoder) just to get it's instruction right.
The 3rd decoder? Something like an array of 16 x 8 x 32:1 muxes.
(that's for a pessimistic case in which you want to be able to decode 3 instructions per cycle in all cases).
But an even bigger source of exponential complexity is speed.
The approach is described above is fairly cheap in terms of transistors but the logic depth (delay) of the various stages adds up quickly.
To operate at higher frequencies, one wants a more parallel/speculative approach that reduces logic depth but greatly increases total transistor count (and area and power).