By: ⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com), June 4, 2022 11:39 am
Room: Moderated Discussions
Peter Lewis (peter.delete@this.notyahoo.com) on June 1, 2022 3:55 pm wrote:
> >> I think x86 will eventually be killed by variable length instruction decode, Moore’s law slowing
> >> down, availability of software binary translation from x86 to something else and most low-performance
> >> software running on top of JavaScript. The x86 instruction sets will eventually have the same market
> >> significance as the IBM 360 instruction set. I own Intel stock and I’m not selling because I think
> >> it will take more than 20 years for x86 to be displaced from the dominant position it has today.
> >
> > Why? What are the market forces that you believe will displace
> > x86? What do you think will replace it, RISC-V?
>
> My guess is the higher complexity and higher power consumption of x86 will eventually allow ARM implementations
> to outperform x86 implementations. Apple’s M1 P-cores currently decode 8 instructions per clock, while Intel’s
> Golden Cove cores in Alder Lake and Sapphire Rapids decode 6 instructions per clock. When ARM implementations
> are decoding 32 instructions per clock, it will be very difficult for x86 implementations to keep up.
(emphasis in the above text was added by me)
In most CPU architectures, instruction boundaries and offsets are close to being time-invariant during program execution - irrespective of whether the CPU architecture is CISC (variable-length instructions) or RISC (fixed-length instructions). The two major differences between a fixed-length and a variable-length encoding are that (1) variable-length encoding takes a small amount of time to initialize/determine those time-invariant boundaries and offsets and (2) fixed-length encoding is slightly less space efficient and thus takes slightly longer to load from memory in case of a cache miss. In practice, (1) and (2) cancel each other out.
x86 implementations will be able to decode 32 instructions per clock if those instructions have already been pre-decoded (µop cache).
If a RISC CPU has a µop cache, the space-efficiency of the µop cache can be better than the space-efficiency of RISC instructions generated by the compiler of a programming language. RISC can benefit from a µop cache and might not need an L1I cache.
It is possible that in the near future it will become clear that CPU performance is determined by the number of [conditional] branches successfully predicted per cycle - and not by whether the CPU's compiler/user interface uses a fixed-length or a variable-length encoding.
VLIW CPU's have even simpler instruction decoding than RISC CPUs, but a problem is that the size of L1I caches is in general very limited and in consequence of this the number of VLIW instructions stored in the L1I cache is small, which has negative performance implications. ---- Secondly, it is problematic to evolve a VLIW ISA over time (decades of years) which means that every new VLIW CPU update uses a new instruction encoding - which is highly problematic if the period of the life-cycle of a compiler is much longer than the time interval between the VLIW ISA updates.
The encoding used by a µop cache is bound to be somewhere between a RISC encoding and a VLIW encoding. But (which is a bit counterintuitive) it can have a better compression ratio than a RISC encoding (which follows from the fact that the size of a µop cache is usually much smaller than the size of RAM installed in the machine (RAM contains RISC instructions; µop cache is mirroring only a small subset of RAM)).
Because - in terms of instruction encoding - RISC is between CISC and VLIW, and a VLIW design has the shortest lifespan, it can be argued that the lifespan of a RISC design must be shorter than the lifespan of a CISC design. Thus, the fact that i386/amd64 survived for such a relatively long period of time might not be a coincidence.
-atom
> >> I think x86 will eventually be killed by variable length instruction decode, Moore’s law slowing
> >> down, availability of software binary translation from x86 to something else and most low-performance
> >> software running on top of JavaScript. The x86 instruction sets will eventually have the same market
> >> significance as the IBM 360 instruction set. I own Intel stock and I’m not selling because I think
> >> it will take more than 20 years for x86 to be displaced from the dominant position it has today.
> >
> > Why? What are the market forces that you believe will displace
> > x86? What do you think will replace it, RISC-V?
>
> My guess is the higher complexity and higher power consumption of x86 will eventually allow ARM implementations
> to outperform x86 implementations. Apple’s M1 P-cores currently decode 8 instructions per clock, while Intel’s
> Golden Cove cores in Alder Lake and Sapphire Rapids decode 6 instructions per clock. When ARM implementations
> are decoding 32 instructions per clock, it will be very difficult for x86 implementations to keep up.
(emphasis in the above text was added by me)
In most CPU architectures, instruction boundaries and offsets are close to being time-invariant during program execution - irrespective of whether the CPU architecture is CISC (variable-length instructions) or RISC (fixed-length instructions). The two major differences between a fixed-length and a variable-length encoding are that (1) variable-length encoding takes a small amount of time to initialize/determine those time-invariant boundaries and offsets and (2) fixed-length encoding is slightly less space efficient and thus takes slightly longer to load from memory in case of a cache miss. In practice, (1) and (2) cancel each other out.
x86 implementations will be able to decode 32 instructions per clock if those instructions have already been pre-decoded (µop cache).
If a RISC CPU has a µop cache, the space-efficiency of the µop cache can be better than the space-efficiency of RISC instructions generated by the compiler of a programming language. RISC can benefit from a µop cache and might not need an L1I cache.
It is possible that in the near future it will become clear that CPU performance is determined by the number of [conditional] branches successfully predicted per cycle - and not by whether the CPU's compiler/user interface uses a fixed-length or a variable-length encoding.
VLIW CPU's have even simpler instruction decoding than RISC CPUs, but a problem is that the size of L1I caches is in general very limited and in consequence of this the number of VLIW instructions stored in the L1I cache is small, which has negative performance implications. ---- Secondly, it is problematic to evolve a VLIW ISA over time (decades of years) which means that every new VLIW CPU update uses a new instruction encoding - which is highly problematic if the period of the life-cycle of a compiler is much longer than the time interval between the VLIW ISA updates.
The encoding used by a µop cache is bound to be somewhere between a RISC encoding and a VLIW encoding. But (which is a bit counterintuitive) it can have a better compression ratio than a RISC encoding (which follows from the fact that the size of a µop cache is usually much smaller than the size of RAM installed in the machine (RAM contains RISC instructions; µop cache is mirroring only a small subset of RAM)).
Because - in terms of instruction encoding - RISC is between CISC and VLIW, and a VLIW design has the shortest lifespan, it can be argued that the lifespan of a RISC design must be shorter than the lifespan of a CISC design. Thus, the fact that i386/amd64 survived for such a relatively long period of time might not be a coincidence.
-atom