By: Mark Roulo (nothanks.delete@this.xxx.com), August 2, 2022 6:48 pm
Room: Moderated Discussions
NvaxPlus (spam.delete@this.spam.com) on August 2, 2022 8:45 am wrote:
> I'm curious if anyone here has any good resources (papers, or things of that sort) on empirical
> data about what works and what doesn't in an ISA. It seems to me that a lot of arguments about
> instruction set design rely on intuition and received wisdom. Sometimes that's helpful and sometimes
> it isn't. I've been doing a (admittedly, cursory) literature review on this subject and was
> surprised just how seemingly little publicly available information there is.
I'll be interested to see what you find, but my (non-rigorous!) belief is that the most important thing for an ISA is to avoid a number of screwups. For ISAs intended for high performance implementations these include
There are probably a few more, but once you've done this I think the actual IMPLEMENTATION starts to dominate. So x86-64, ARM8 and POWER I expect to all be dominated by implementation choices for various actual chips. Implementation choices such as more physical registers for an OoO engine to use. Or bundling (or not) instructions as they go through the OoO engine (e.g. POWER4). Implementation design choices for which instructions can go to which execution units (and the various tradeoffs for more vs. less general instruction units). How many transistors are spent on branch prediction and how good the branch predictors and good pre-fetchers (I and D) are. Better caches. Pipeline and transistor design choices that allow for higher frequencies.
And a number of these will be better for some loads and worse for others. Which just adds another dimension to any (externally published) analysis. I'm sure Apple has some answers for the loads IT cares about on ARM but I don't expect to find much published information about this. Same for Intel. And IBM.
Another "catch" with what you want is that once you've avoided screwing up too much at the ISA level you'll need to have specific implementations (even in a simulator) to play with things such as "how large should constants in an instruction be?" and "how many ISA visible registers should I have?". The specific choices are going to matter and THEN how well a given compiler can USE these will matter, too.
I hope you find something, but I'm not optimistic.
> I'm curious if anyone here has any good resources (papers, or things of that sort) on empirical
> data about what works and what doesn't in an ISA. It seems to me that a lot of arguments about
> instruction set design rely on intuition and received wisdom. Sometimes that's helpful and sometimes
> it isn't. I've been doing a (admittedly, cursory) literature review on this subject and was
> surprised just how seemingly little publicly available information there is.
I'll be interested to see what you find, but my (non-rigorous!) belief is that the most important thing for an ISA is to avoid a number of screwups. For ISAs intended for high performance implementations these include
- Register windows. These mostly seem to be a bad idea.
- Stack or memory based architecture (you really want registers)
- Instruction encoding that makes parallel decode difficult (e.g. 680x0). Fixed length is one way to make parallel decode straightforward, but being able to easily/rapidly determine the length of a given instruction seems to be sufficient
- Instructions with many indirect memory accesses
There are probably a few more, but once you've done this I think the actual IMPLEMENTATION starts to dominate. So x86-64, ARM8 and POWER I expect to all be dominated by implementation choices for various actual chips. Implementation choices such as more physical registers for an OoO engine to use. Or bundling (or not) instructions as they go through the OoO engine (e.g. POWER4). Implementation design choices for which instructions can go to which execution units (and the various tradeoffs for more vs. less general instruction units). How many transistors are spent on branch prediction and how good the branch predictors and good pre-fetchers (I and D) are. Better caches. Pipeline and transistor design choices that allow for higher frequencies.
And a number of these will be better for some loads and worse for others. Which just adds another dimension to any (externally published) analysis. I'm sure Apple has some answers for the loads IT cares about on ARM but I don't expect to find much published information about this. Same for Intel. And IBM.
Another "catch" with what you want is that once you've avoided screwing up too much at the ISA level you'll need to have specific implementations (even in a simulator) to play with things such as "how large should constants in an instruction be?" and "how many ISA visible registers should I have?". The specific choices are going to matter and THEN how well a given compiler can USE these will matter, too.
I hope you find something, but I'm not optimistic.