By: Brett (ggtgp.delete@this.yahoo.com), August 2, 2022 10:41 pm
Room: Moderated Discussions
Mark Roulo (nothanks.delete@this.xxx.com) on August 2, 2022 6:48 pm wrote:
> NvaxPlus (spam.delete@this.spam.com) on August 2, 2022 8:45 am wrote:
> > I'm curious if anyone here has any good resources (papers, or things of that sort) on empirical
> > data about what works and what doesn't in an ISA. It seems to me that a lot of arguments about
> > instruction set design rely on intuition and received wisdom. Sometimes that's helpful and sometimes
> > it isn't. I've been doing a (admittedly, cursory) literature review on this subject and was
> > surprised just how seemingly little publicly available information there is.
>
> I'll be interested to see what you find, but my (non-rigorous!) belief is that the most important thing for an
> ISA is to avoid a number of screwups. For ISAs intended for high performance implementations these include
>
>
> There are probably a few more, but once you've done this I think the actual IMPLEMENTATION starts to dominate.
> So x86-64, ARM8 and POWER I expect to all be dominated by implementation choices for various actual chips.
> Implementation choices such as more physical registers for an OoO engine to use. Or bundling (or not) instructions
> as they go through the OoO engine (e.g. POWER4). Implementation design choices for which instructions can go
> to which execution units (and the various tradeoffs for more vs. less general instruction units). How many
> transistors are spent on branch prediction and how good the branch predictors and good pre-fetchers (I and D)
> are. Better caches. Pipeline and transistor design choices that allow for higher frequencies.
>
> And a number of these will be better for some loads and worse for others. Which just adds another dimension
> to any (externally published) analysis. I'm sure Apple has some answers for the loads IT cares about on
> ARM but I don't expect to find much published information about this. Same for Intel. And IBM.
>
> Another "catch" with what you want is that once you've avoided screwing up too much at the ISA level you'll
> need to have specific implementations (even in a simulator) to play with things such as "how large should
> constants in an instruction be?" and "how many ISA visible registers should I have?". The specific choices
> are going to matter and THEN how well a given compiler can USE these will matter, too.
>
> I hope you find something, but I'm not optimistic.
A full set of integer instructions with three sources is all that’s left to do on the high end. Reduces your critical path length and saves a write port and tracking compared to two instructions.
Things like two writes in an instruction do not help much due to instruction combining or just being wide giving the same effect. Mul Hi/Lo and double register shifts.
You could pack an in-then-else in one instruction saving a short jump, but that just saves code space and makes downstream handling a hassle. Bottom of the barrel stuff.
> NvaxPlus (spam.delete@this.spam.com) on August 2, 2022 8:45 am wrote:
> > I'm curious if anyone here has any good resources (papers, or things of that sort) on empirical
> > data about what works and what doesn't in an ISA. It seems to me that a lot of arguments about
> > instruction set design rely on intuition and received wisdom. Sometimes that's helpful and sometimes
> > it isn't. I've been doing a (admittedly, cursory) literature review on this subject and was
> > surprised just how seemingly little publicly available information there is.
>
> I'll be interested to see what you find, but my (non-rigorous!) belief is that the most important thing for an
> ISA is to avoid a number of screwups. For ISAs intended for high performance implementations these include
>
- Register windows. These mostly seem to be a bad idea.
- Stack or memory based architecture (you really want registers)
- Instruction encoding that makes parallel decode difficult (e.g. 680x0). Fixed
> length is one way to make parallel decode straightforward, but being able to easily/rapidly
> determine the length of a given instruction seems to be sufficient - Instructions with many indirect memory accesses
>
>
>
>
>
>
> There are probably a few more, but once you've done this I think the actual IMPLEMENTATION starts to dominate.
> So x86-64, ARM8 and POWER I expect to all be dominated by implementation choices for various actual chips.
> Implementation choices such as more physical registers for an OoO engine to use. Or bundling (or not) instructions
> as they go through the OoO engine (e.g. POWER4). Implementation design choices for which instructions can go
> to which execution units (and the various tradeoffs for more vs. less general instruction units). How many
> transistors are spent on branch prediction and how good the branch predictors and good pre-fetchers (I and D)
> are. Better caches. Pipeline and transistor design choices that allow for higher frequencies.
>
> And a number of these will be better for some loads and worse for others. Which just adds another dimension
> to any (externally published) analysis. I'm sure Apple has some answers for the loads IT cares about on
> ARM but I don't expect to find much published information about this. Same for Intel. And IBM.
>
> Another "catch" with what you want is that once you've avoided screwing up too much at the ISA level you'll
> need to have specific implementations (even in a simulator) to play with things such as "how large should
> constants in an instruction be?" and "how many ISA visible registers should I have?". The specific choices
> are going to matter and THEN how well a given compiler can USE these will matter, too.
>
> I hope you find something, but I'm not optimistic.
A full set of integer instructions with three sources is all that’s left to do on the high end. Reduces your critical path length and saves a write port and tracking compared to two instructions.
Things like two writes in an instruction do not help much due to instruction combining or just being wide giving the same effect. Mul Hi/Lo and double register shifts.
You could pack an in-then-else in one instruction saving a short jump, but that just saves code space and makes downstream handling a hassle. Bottom of the barrel stuff.