RISC-V inferior to ARMv8

By: Michael S (already5chosen.delete@this.yahoo.com), December 21, 2018 2:27 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on December 20, 2018 8:51 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on December 20, 2018 3:24 pm wrote:
> >
> > I agree with everything you said except pre-indexed/post-indexed addressing modes.
> > Those, IMHO, are misfeatures, esp. for integer load instructions.
> I do not understand why you believe that this are misfeatures.
> The auto-indexed addressing modes introduced by IBM 801 and copied by ARM, PA-RISC, POWER and others
> are the only way of coding any kind of loop without any extra instructions for address computations.

Are we talking about zero ovehead or low overhead in unrolled loops?
For the later, I don't see how it is true. for the former it could be true, but has little practical consequances, because non-unrolled loop with small body typically has other, more serious, performance troubles that make it impractical when performance *really* matters.

> The previous kinds of auto-indexed addressing modes allowed only a small set of increments
> or decrements, so they were suitable only for certain kinds of loops, not for any loop.
> The only other addressing mode that can be chosen as an alternative for the IBM auto-indexed addressing,
> because it also allows the elimination of the extra instructions in many kinds of loops, including in the
> most frequent, but not in all loops, is the 3-component addressing mode (base, index & shift) introduced
> by VAX and also adopted by Intel 80386. However, the implementation of this addressing mode seems more difficult,
> because even many modern processors do not succeed to perform it at maximum speed in all cases.

Define 'maximum speed".
Low latency is indeed difficult, but Intel and AMD have no troubles achieving maximal throughput in all cases. And for loops it's typically the only thing that matters.
IBM, on the other hand, had troubles with achieving maximal throughput for "with update" addressing modes, at least in case of integer loads on many POWER cores.
May be, it does not apply to POWER8 and/or 9, I didn't analyze them that deeply.
It's all about contention on GP register file update bus.

> When neither IBM auto-indexed modes nor VAX 3-component addressing are available, there are
> many kinds of loops which cannot be coded with a minimal number of instructions because address
> computation instructions must be added besides the data handling instructions.
> The RISC-V fans argue that the extra instructions do not matter, because a fast implementation will fuse
> the address computation instructions with the data handling instructions, achieving the same throughput.

RISC-V has far more basic problem than that - an absence of [reg+reg] mode.
In my previous comment I was not defending RISC-V, just telling that aarch64 went over the board.

> I do not agree, because I believe that it is stupid to code the address computation with an extra instruction
> word, when the same thing can be encoded with a couple of bits in an addressing mode field and the instruction
> decoder is also certainly simpler than the one that must fuse those instruction pairs.
