By: --- (---.delete@this.redheron.com), August 2, 2022 10:25 am
Room: Moderated Discussions
NvaxPlus (spam.delete@this.spam.com) on August 2, 2022 8:45 am wrote:
> I'm curious if anyone here has any good resources (papers, or things of that sort) on empirical
> data about what works and what doesn't in an ISA. It seems to me that a lot of arguments about
> instruction set design rely on intuition and received wisdom. Sometimes that's helpful and sometimes
> it isn't. I've been doing a (admittedly, cursory) literature review on this subject and was
> surprised just how seemingly little publicly available information there is.
>
> When I say "empirical data" I mean specifically things like, how large immediates you
> actually need, what kind branching ops actually get used (i.e. do we need the full
> gamut of integer comparisons, or are equals/does not equal zero enough?), &c.
>
> The question re: immediate size is especially interesting to me. The only real data I could find on
> this is from H&P and as far as I could see they don't really delve into their methodology. What they
> do show suggests that the log2 of immediates is bimodal which kind of intuitively makes sense to me.
The recent data on, eg, branch prediction, show what sort of lengths are *common*. For example fig 3 of https://arxiv.org/pdf/2106.04205.pdf
Of course you need some sort of out for the rare cases that go beyond this, but at that point the usual "aesthetic" considerations kick in, with some people preferring load-type solutions (GOT and so on), others preferring multi-step-creation solutions.
A second issue that is frequently overlooked is that constants are used for different things, with different ranges.
There are at least
- branch offsets
- logical constants
- immediates (eg for adding/subtracting loop counters)
- FP/vector constants.
ARMv8, as the most recent carefully designed ISA (yes, RISC-V is more recent...) seems to have spent some time thinking about all of these, for example they have special patterns for the logicals, and ways to construct both the common branch offsets (short) and a two-step longer branch offset.
One interesting point is that ARMv8 has ways to load a few (I think it's 8?) FP constants as immediates, but M1 does not accelerate these the way they do for integer immediates. Does this mean that they are rarely used? Or does it mean that Apple sees integer latency as an important issue, whereas their FP is optimized for throughput so doing this would be just wasted energy/area? I'm not sure.
> I'm curious if anyone here has any good resources (papers, or things of that sort) on empirical
> data about what works and what doesn't in an ISA. It seems to me that a lot of arguments about
> instruction set design rely on intuition and received wisdom. Sometimes that's helpful and sometimes
> it isn't. I've been doing a (admittedly, cursory) literature review on this subject and was
> surprised just how seemingly little publicly available information there is.
>
> When I say "empirical data" I mean specifically things like, how large immediates you
> actually need, what kind branching ops actually get used (i.e. do we need the full
> gamut of integer comparisons, or are equals/does not equal zero enough?), &c.
>
> The question re: immediate size is especially interesting to me. The only real data I could find on
> this is from H&P and as far as I could see they don't really delve into their methodology. What they
> do show suggests that the log2 of immediates is bimodal which kind of intuitively makes sense to me.
The recent data on, eg, branch prediction, show what sort of lengths are *common*. For example fig 3 of https://arxiv.org/pdf/2106.04205.pdf
Of course you need some sort of out for the rare cases that go beyond this, but at that point the usual "aesthetic" considerations kick in, with some people preferring load-type solutions (GOT and so on), others preferring multi-step-creation solutions.
A second issue that is frequently overlooked is that constants are used for different things, with different ranges.
There are at least
- branch offsets
- logical constants
- immediates (eg for adding/subtracting loop counters)
- FP/vector constants.
ARMv8, as the most recent carefully designed ISA (yes, RISC-V is more recent...) seems to have spent some time thinking about all of these, for example they have special patterns for the logicals, and ways to construct both the common branch offsets (short) and a two-step longer branch offset.
One interesting point is that ARMv8 has ways to load a few (I think it's 8?) FP constants as immediates, but M1 does not accelerate these the way they do for integer immediates. Does this mean that they are rarely used? Or does it mean that Apple sees integer latency as an important issue, whereas their FP is optimized for throughput so doing this would be just wasted energy/area? I'm not sure.