By: wumpus (lost.delete@this.in.a.cave), November 6, 2019 5:19 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on November 5, 2019 11:39 am wrote:
> anon.1 (abc.delete@this.def.com) on November 5, 2019 6:17 am wrote:
> >
> > Somewhat orthogonal: I don't understand the opposition to
> > 2 reg source loads though. It still satisfies 2R1W.
>
> I think people get worried about the whole "loads and stores would be different".
>
> So then you'd have the base/index addressing only for loads, and stores would
> have base-only, because it needs the other read port for store data.
>
> Of course, stores don't need the 1W, so if you're willing to bend the rules a bit that's not a problem:
> you accept that you will always split stores into "generate address" and "generate data" ops.
>
> But if your primary design goal is a minimally simple pipeline, that doesn't work.
>
> Personally, I think that "not willing to ever bend rules" shows you don't care about
> reality. And again, personally, I think that "hey, loads and stores are very different,
> so maybe it's perfectly fine if addressing is very different too".
>
> But if you're the kind of person who finds traditional RISC to be "fundamentally beautiful", you're
> not the kind of person who accepts either of those choices. One is bending the rules, and the other
> is not symmetrical and thus unacceptable. It's like having specialized registers, for chrissake! Where
> do you end up? It's shades of x86 with one register dedicated for shift counts? Quelle horreur!
>
> End result: base-index addressing is evil and against everything RISC-V stood for.
>
> Linus
This argument hardly convinces me that purity was more important than "minimally simple pipelines. The most important thing for RISC V to have is actual implementations. Whether academic or deeply embedded (like the Western Digital CPU), what they need is both real silicon and actual software written for the architecture. Once they have that, RISC V will have some chance to live up to the hype. With a more complex (and potentially higher performance) pipeline, they might have a potential, but without existing silicon and software it would be little more than MMIX.
To be honest, your posts here have hammered in to me just how all important the infrastructure is. If the infrastructure exists, you can justify all the fused instructions if necessary. If it doesn't, you might as well cough up for a ARM device/license.
Doug S (foo.delete@this.bar.bar) on November 6, 2019 11:47 am wrote:
> anon.1 (abc.delete@this.def.com) on November 6, 2019 11:19 am wrote:
> > The whole point of RISC was to make
> > decode simple. Now they want to add complexity in decode because, well, the ISA is oversimplified.
>
>
> The RISC concept was created 40 years ago. Things have changed, designers have a transistor
> budget orders of magnitude larger today so what was appropriate for a decoder in 1980
> shouldn't be a limitation on what is appropriate for a decoder in 2020.
According to John Mashey's (in hindsight) RISC definition, addressing modes were all important. Granted, less complex addressing modes were "more RISCy" than base-index addressing, but really as long as you didn't require more than one memory access (ignoring MMU issues) you were still RISC. https://www.yarchive.net/comp/risc_definition.html
Decode is hardly an issue, although you really don't want more than two possible instruction sizes (and one should be a simple multiple of the other if you have to do that).
Personally, I'd go for full huffman coding (huffman coding the opcode and each oprands) in L2 and decode each to L1. Without the "wait for a jump" issue that x86 has, you could start decoding at the start of the cacheline (possibly with some sort of interleaving for multiple parsers) and decode a 64 bit (or so) instruction (this might delay any jump that isn't at the start of a cacheline, but I'm sure compiler writers can align long jumps/calls with cachelines). You might need some sort of "pre-decoder bits" as seen from x86 (generated from an L3/DRAM load), but that just allows parallel decoders to do their thing. For a high performance architecture, I doubt I'd even start close to MIPS (and similar), but a compressed instruction stream and a dense instruction set (at least in L2 and above) would be right up many of RISC-V's uses.
> anon.1 (abc.delete@this.def.com) on November 5, 2019 6:17 am wrote:
> >
> > Somewhat orthogonal: I don't understand the opposition to
> > 2 reg source loads though. It still satisfies 2R1W.
>
> I think people get worried about the whole "loads and stores would be different".
>
> So then you'd have the base/index addressing only for loads, and stores would
> have base-only, because it needs the other read port for store data.
>
> Of course, stores don't need the 1W, so if you're willing to bend the rules a bit that's not a problem:
> you accept that you will always split stores into "generate address" and "generate data" ops.
>
> But if your primary design goal is a minimally simple pipeline, that doesn't work.
>
> Personally, I think that "not willing to ever bend rules" shows you don't care about
> reality. And again, personally, I think that "hey, loads and stores are very different,
> so maybe it's perfectly fine if addressing is very different too".
>
> But if you're the kind of person who finds traditional RISC to be "fundamentally beautiful", you're
> not the kind of person who accepts either of those choices. One is bending the rules, and the other
> is not symmetrical and thus unacceptable. It's like having specialized registers, for chrissake! Where
> do you end up? It's shades of x86 with one register dedicated for shift counts? Quelle horreur!
>
> End result: base-index addressing is evil and against everything RISC-V stood for.
>
> Linus
This argument hardly convinces me that purity was more important than "minimally simple pipelines. The most important thing for RISC V to have is actual implementations. Whether academic or deeply embedded (like the Western Digital CPU), what they need is both real silicon and actual software written for the architecture. Once they have that, RISC V will have some chance to live up to the hype. With a more complex (and potentially higher performance) pipeline, they might have a potential, but without existing silicon and software it would be little more than MMIX.
To be honest, your posts here have hammered in to me just how all important the infrastructure is. If the infrastructure exists, you can justify all the fused instructions if necessary. If it doesn't, you might as well cough up for a ARM device/license.
Doug S (foo.delete@this.bar.bar) on November 6, 2019 11:47 am wrote:
> anon.1 (abc.delete@this.def.com) on November 6, 2019 11:19 am wrote:
> > The whole point of RISC was to make
> > decode simple. Now they want to add complexity in decode because, well, the ISA is oversimplified.
>
>
> The RISC concept was created 40 years ago. Things have changed, designers have a transistor
> budget orders of magnitude larger today so what was appropriate for a decoder in 1980
> shouldn't be a limitation on what is appropriate for a decoder in 2020.
According to John Mashey's (in hindsight) RISC definition, addressing modes were all important. Granted, less complex addressing modes were "more RISCy" than base-index addressing, but really as long as you didn't require more than one memory access (ignoring MMU issues) you were still RISC. https://www.yarchive.net/comp/risc_definition.html
Decode is hardly an issue, although you really don't want more than two possible instruction sizes (and one should be a simple multiple of the other if you have to do that).
Personally, I'd go for full huffman coding (huffman coding the opcode and each oprands) in L2 and decode each to L1. Without the "wait for a jump" issue that x86 has, you could start decoding at the start of the cacheline (possibly with some sort of interleaving for multiple parsers) and decode a 64 bit (or so) instruction (this might delay any jump that isn't at the start of a cacheline, but I'm sure compiler writers can align long jumps/calls with cachelines). You might need some sort of "pre-decoder bits" as seen from x86 (generated from an L3/DRAM load), but that just allows parallel decoders to do their thing. For a high performance architecture, I doubt I'd even start close to MIPS (and similar), but a compressed instruction stream and a dense instruction set (at least in L2 and above) would be right up many of RISC-V's uses.