By: Adrian (a.delete@this.acm.org), August 3, 2022 11:33 am
Room: Moderated Discussions
NoSpammer (no.delete@this.spam.com) on August 3, 2022 7:55 am wrote:
> Adrian (a.delete@this.acm.org) on August 3, 2022 2:19 am wrote:
> > Mark Roulo (nothanks.delete@this.xxx.com) on August 2, 2022 6:48 pm wrote:
> > >Stack or memory based architecture (you really want registers)
> >
> >
> > Nobody has proven yet whether encoding the operands implicitly with operations defined
> > on a stack is better or worse than encoding the operands using register numbers.
>
> So mainstream architectures that used to use stack-like structures like intel x87
> and archs using register windows being obsoleted are not a practical proof?
I have already said that the problem of 8087 and of all other now obsolete ISAs is using only 1 operand stack, not unsing operand stacks.
So what you claim to be a practical proof is certainly no proof at all.
>
> > The problem with the early stack-based instruction encodings, like Intel
> > 8087, was that they used ONLY ONE STACK, not that they were stack based.
>
> How would N+1 stacks improve performance for general purpose
> code and compilers? How about spillovers and function calls?
The architectures with only 1 stack have been made obsolete by the CPUs with multiple pipelined execution units.
Any such CPU needs the interleaving of multiple chains of dependent instructions to keep busy the execution units. An instruction stream using operand stacks cannot encode more separate chains of dependent instructions than the number of operand stacks.
That is why any revival of a stack-based ISA would need to use multiple operand stacks, as many as the number of distinct chains of dependent instructions that are needed to keep all the pipelined execution units filled.
>
> > In modern OoO CPUs, the register numbers used in the instruction encodings are no longer
> > addresses, but only means to specify the data dependencies between instructions. The same
> > data dependencies can be specified implicitly when the operand belong to a stack.
>
> I would correct you that the register numbers are used to specify EXPLICIT dependencies between instructions,
> which is very important if you want to parse dependencies quickly without additional indirection.
Here you might touch a real problem of a stack-based ISA. I have not tried to implement such a CPU, so I do not know how much complexity is added by the need to generate the equivalent register numbers by keeping track of a stack position.
It is possible that a stack-based ISA might need too much extra area and/or it might have a lower clock frequency limit.
However, I have already said this in my previous post, maybe a stack-based ISA is not worthwhile, but in any case nobody has proven this yet.
The ISAs with only 1 stack have been rightly abandoned, but the reason why that has been done is not applicable for an ISA with multiple operand stacks.
>
> > Allowing each instruction to specify 1 of 4 operand stacks might be enough
> > to remove most false dependencies, while specifying 1 of 8 operand stacks
> > is pretty much certain to prevent the apparition of false dependencies.
>
> Sure if 8 registers only works kinda ok for most code, 8 stacks will do. Maybe you actually
> keep the top of the stack in the register and spill over the stack, oh look, x86.
No, the number of operand stacks which are needed for a given throughput is much less than the number of addressable registers that are needed for the same throughput.
When compared with an ISA with 3 register addresses per instruction, an ISA with 8 operand stacks is more like having at least 24 addressable registers, as they can have similar limits for the number of distinct chains of dependent instructions that can be encoded. The exact equivalence would vary from program to program. It is likely that are many cases when encoding 8 distinct chains of dependent instructions would require much more than 32 addressable registers.
The register renaming could be implemented in such a way so that only the sum of the stack lengths of all stacks is limited by the number of registers in the register file, without lower limits for the length of any individual stack.
>
> I think it's quite clear that is not a compiler friendly target for current compiler tech.
> It's also quite clear that you will be burning power resolving a layer of indirection.
> It's also quite clear that you will be wasting time shuffling stack order at many block edges.
>
I do not believe that converting code generated in SSA form using an infinite number of registers to code using a stack per chain of dependent instructions is any more difficult than the conversion to code using a fixed number of registers, including the register allocation problem.
The Mill project also included as a major ISA change the use of a FIFO instead of stacks or addressable registers, and compiling for that would need even more changes in compiler technology.
Of course, taking into account that the Mill project didn't go anywhere, that is not a flattering comparison.
Like I have said, I have no idea whether using multiple stacks would be better or worse than using addressable registers.
What I know for sure is that nobody, including you, has produced any valid argument that could settle the answer to this question.
> Adrian (a.delete@this.acm.org) on August 3, 2022 2:19 am wrote:
> > Mark Roulo (nothanks.delete@this.xxx.com) on August 2, 2022 6:48 pm wrote:
> > >
> >
> >
> > Nobody has proven yet whether encoding the operands implicitly with operations defined
> > on a stack is better or worse than encoding the operands using register numbers.
>
> So mainstream architectures that used to use stack-like structures like intel x87
> and archs using register windows being obsoleted are not a practical proof?
I have already said that the problem of 8087 and of all other now obsolete ISAs is using only 1 operand stack, not unsing operand stacks.
So what you claim to be a practical proof is certainly no proof at all.
>
> > The problem with the early stack-based instruction encodings, like Intel
> > 8087, was that they used ONLY ONE STACK, not that they were stack based.
>
> How would N+1 stacks improve performance for general purpose
> code and compilers? How about spillovers and function calls?
The architectures with only 1 stack have been made obsolete by the CPUs with multiple pipelined execution units.
Any such CPU needs the interleaving of multiple chains of dependent instructions to keep busy the execution units. An instruction stream using operand stacks cannot encode more separate chains of dependent instructions than the number of operand stacks.
That is why any revival of a stack-based ISA would need to use multiple operand stacks, as many as the number of distinct chains of dependent instructions that are needed to keep all the pipelined execution units filled.
>
> > In modern OoO CPUs, the register numbers used in the instruction encodings are no longer
> > addresses, but only means to specify the data dependencies between instructions. The same
> > data dependencies can be specified implicitly when the operand belong to a stack.
>
> I would correct you that the register numbers are used to specify EXPLICIT dependencies between instructions,
> which is very important if you want to parse dependencies quickly without additional indirection.
Here you might touch a real problem of a stack-based ISA. I have not tried to implement such a CPU, so I do not know how much complexity is added by the need to generate the equivalent register numbers by keeping track of a stack position.
It is possible that a stack-based ISA might need too much extra area and/or it might have a lower clock frequency limit.
However, I have already said this in my previous post, maybe a stack-based ISA is not worthwhile, but in any case nobody has proven this yet.
The ISAs with only 1 stack have been rightly abandoned, but the reason why that has been done is not applicable for an ISA with multiple operand stacks.
>
> > Allowing each instruction to specify 1 of 4 operand stacks might be enough
> > to remove most false dependencies, while specifying 1 of 8 operand stacks
> > is pretty much certain to prevent the apparition of false dependencies.
>
> Sure if 8 registers only works kinda ok for most code, 8 stacks will do. Maybe you actually
> keep the top of the stack in the register and spill over the stack, oh look, x86.
No, the number of operand stacks which are needed for a given throughput is much less than the number of addressable registers that are needed for the same throughput.
When compared with an ISA with 3 register addresses per instruction, an ISA with 8 operand stacks is more like having at least 24 addressable registers, as they can have similar limits for the number of distinct chains of dependent instructions that can be encoded. The exact equivalence would vary from program to program. It is likely that are many cases when encoding 8 distinct chains of dependent instructions would require much more than 32 addressable registers.
The register renaming could be implemented in such a way so that only the sum of the stack lengths of all stacks is limited by the number of registers in the register file, without lower limits for the length of any individual stack.
>
> I think it's quite clear that is not a compiler friendly target for current compiler tech.
> It's also quite clear that you will be burning power resolving a layer of indirection.
> It's also quite clear that you will be wasting time shuffling stack order at many block edges.
>
I do not believe that converting code generated in SSA form using an infinite number of registers to code using a stack per chain of dependent instructions is any more difficult than the conversion to code using a fixed number of registers, including the register allocation problem.
The Mill project also included as a major ISA change the use of a FIFO instead of stacks or addressable registers, and compiling for that would need even more changes in compiler technology.
Of course, taking into account that the Mill project didn't go anywhere, that is not a flattering comparison.
Like I have said, I have no idea whether using multiple stacks would be better or worse than using addressable registers.
What I know for sure is that nobody, including you, has produced any valid argument that could settle the answer to this question.