By: Peter Lund (peterfirefly.delete@this.gmail.com), December 1, 2014 5:06 am
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on November 27, 2014 4:04 pm wrote:
> Peter Lund (peterfirefly.delete@this.gmail.com) on November 27, 2014 12:48 pm wrote:
> >>
> > Lots of people assumed (and kept assuming even after the P6) that you could either
> > handle simple and strict RISC-formats directly /or/ you could use microcode.
> >
> > My impression after arguing with him in comp.arch was that
> > he still assumed that, even after Opteron and Pentium M...
> >
> > Btw. you don't actually need OOO to make a simple VAX pipeline
> > with µops and "transactional" register renaming.
>
> Yet, you don't really make a strong case on how that would have been done and how e
>
> >
> > Moreover, many of the problems with the VAX could have been fixed:
> >
> > o the calling conventions could have improved. They did improve in practice during
> > the life of the VAX, because the CALL* instructions were used less and the BSR instruction
> > was used more. They could have gone further in that direction.
> > o they could have added a separate set of registers for floating-point (and added IEEE at the same time)
> > -- they sort of did that, but they added a *VECTOR* floating-point instruction set that was asynchronous
> > with the rest of the machine (you needed to add synch instrucitons...). That was probably a dumb move.
> > o they could have added better addressing modes, i.e. the same kind of addressing modes
> > that IA32 and AMD64 have. The compilers and the systems code could have been gradually
> > transitioned. Dual-versions of systems code and libraries could have coexisted.
> > o they could have added 64-bit addressing, registers, and operations.
>
> Yes but
> 1. Requires new ABI.
> 2: Requires new ABI, new instructions.
> 3: Requires New instructions.
>
> New instructions only provide advantage when software adopts it, which isn't immediate.
> And new ABIs are much much slower to be adopted.
Still far better than switching to a brand new architecture that requires a new ABI, new compiler, new assembler, new linker, new loader, porting of the OS, etc.
And it was indeed possible to gradually transition to a better calling convention (as they indeed did, but it could have been even better). It was certainly also possible to use the new instructions in the OS right from the start because DEC had control over both hardware and software.
>
> And none of these do what P6 did for x86: it ran existing software really fast.
Doing the decoder => µops thing in a clean way could have been the VAX-equivalent of the 486 or the 68040. Instead, they made the VAX 9000/NVAX (same microarchitecture) which also had single-cycle execution of most of the commonly occuring instructions (in the best case) but which did it with a weird box on the side that did the memory I/O and communicated with the rest of the execution engine in a complicated way that would have been hard to expand to something Pentium-like and then further to something P6-like.
Just register renaming + extra physical registers with checkpoint/restore of the renaming would have been a nice improvement. If you read the published microcode for the VLSI VAX CPUs, you'll note that they used a stack of register changes where each item had a register number, and operand size, and a flag to indicate increment/decrement. Any time they had to abort a partially executed instruction they would roll back the stack one item at a time. Imagine how fun the interaction between the memory box and the rest of the CPU is on VAX 9000/NVAX with this stack. It kinda makes branch prediction with speculative execution complicated and it makes mispredicted branches so expensive that it's barely worth it. If you have renaming and checkpoint/restore, handling mispredicts becomes simple and cheap. It would be rather strange if there wasn't some performance on existing code to be had from that.
-Peter
> Peter Lund (peterfirefly.delete@this.gmail.com) on November 27, 2014 12:48 pm wrote:
> >>
> > Lots of people assumed (and kept assuming even after the P6) that you could either
> > handle simple and strict RISC-formats directly /or/ you could use microcode.
> >
> > My impression after arguing with him in comp.arch was that
> > he still assumed that, even after Opteron and Pentium M...
> >
> > Btw. you don't actually need OOO to make a simple VAX pipeline
> > with µops and "transactional" register renaming.
>
> Yet, you don't really make a strong case on how that would have been done and how e
>
> >
> > Moreover, many of the problems with the VAX could have been fixed:
> >
> > o the calling conventions could have improved. They did improve in practice during
> > the life of the VAX, because the CALL* instructions were used less and the BSR instruction
> > was used more. They could have gone further in that direction.
> > o they could have added a separate set of registers for floating-point (and added IEEE at the same time)
> > -- they sort of did that, but they added a *VECTOR* floating-point instruction set that was asynchronous
> > with the rest of the machine (you needed to add synch instrucitons...). That was probably a dumb move.
> > o they could have added better addressing modes, i.e. the same kind of addressing modes
> > that IA32 and AMD64 have. The compilers and the systems code could have been gradually
> > transitioned. Dual-versions of systems code and libraries could have coexisted.
> > o they could have added 64-bit addressing, registers, and operations.
>
> Yes but
> 1. Requires new ABI.
> 2: Requires new ABI, new instructions.
> 3: Requires New instructions.
>
> New instructions only provide advantage when software adopts it, which isn't immediate.
> And new ABIs are much much slower to be adopted.
Still far better than switching to a brand new architecture that requires a new ABI, new compiler, new assembler, new linker, new loader, porting of the OS, etc.
And it was indeed possible to gradually transition to a better calling convention (as they indeed did, but it could have been even better). It was certainly also possible to use the new instructions in the OS right from the start because DEC had control over both hardware and software.
>
> And none of these do what P6 did for x86: it ran existing software really fast.
Doing the decoder => µops thing in a clean way could have been the VAX-equivalent of the 486 or the 68040. Instead, they made the VAX 9000/NVAX (same microarchitecture) which also had single-cycle execution of most of the commonly occuring instructions (in the best case) but which did it with a weird box on the side that did the memory I/O and communicated with the rest of the execution engine in a complicated way that would have been hard to expand to something Pentium-like and then further to something P6-like.
Just register renaming + extra physical registers with checkpoint/restore of the renaming would have been a nice improvement. If you read the published microcode for the VLSI VAX CPUs, you'll note that they used a stack of register changes where each item had a register number, and operand size, and a flag to indicate increment/decrement. Any time they had to abort a partially executed instruction they would roll back the stack one item at a time. Imagine how fun the interaction between the memory box and the rest of the CPU is on VAX 9000/NVAX with this stack. It kinda makes branch prediction with speculative execution complicated and it makes mispredicted branches so expensive that it's barely worth it. If you have renaming and checkpoint/restore, handling mispredicts becomes simple and cheap. It would be rather strange if there wasn't some performance on existing code to be had from that.
-Peter