By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), April 5, 2017 4:51 pm
Room: Moderated Discussions
RichardC (tich.delete@this.pobox.com) on April 5, 2017 1:23 pm wrote:
>
> OK. But do you take my point that all early 1980s RISC architects were facing the exact
> same problems:
I really think you are mis-stating the whole thing exactly by trying to lump those designers together as similar. They weren't. They weren't even looking for the same things.
The ARM people very much came from a 6502 background, and from a memory access background. That background already did that single-cycle pipelining that you talk up so much. It had absolutely nothing to do with "RISC" in any shape or form.
You can call 6502 "RISC" all you want, but that's just complete BS. It's rewriting history, and it's trying to make the "RISC" moniker cover something it wasn't.
The 6502 was unusual in doing a memory access every cycle. And people appreciated it, particularly compared to the Z80 (which was the most obvious direct competitor). The 6502 was cheap, simple and very limited compared to the Z80, but performed fairly well exactly because the 6502 had that limited pipelining and got a memory access every cycle.
ARM literally started as a 32-bit 6502 replacement, with the simplest possible instruction set decoding not so much out of any RISC "science", but simply because when you have a small handful of people who have never designed a CPU before, you go for simple.
> a) they wanted 1 instruction-per-cycle throughput, and that require enough pipelining
> that branches became expensive
You're really ignoring where ARM came from. They already had that "one instruction per cycle". Really. That was their existing starting point.
(Ok, so reality is actually slightly different: I think 6502 was more like "one memory access per cycle", rather than "one instruction per cycle", so two-byte instructions took two cycles, and then if an instruction made a memory access that took another cycle. A very few instructions were slower than that. And that's actually pretty much exactly what ARM was then designed to do, and the LDM/STM comes from the fact that it is the obvious way to get more data accesses per instruction fetch).
Your "pipelining and single cycle instructions" argument makes sense when look at the people who came from the m68k background (or maybe from Z80 and a CP/M background). Obviously that m68k was where Sun was coming from.
But it doesn't make sense for the ARM people. They came from a different background. They already had that "memory access every cycle" background, and they found things like the 286 or m68k to be lacking exactly because those chips did not.
> b) they wanted a massive amount of I-fetch bandwidth, so they had to take drastic steps
> (large register file, register windows or LDM/STM or an I-cache) to minimize the
> impact of data accesses on instruction-fetch.
Again, I think you lump things together because you want to make an argument, but it doesn't make sense.
ARM's LDM/STM is generally different from the other RISCs (that generally had "load pair" instructions). In many ways it's closer to the CISC instructions. The m68k "movem" instruction in particular. The ARM instructions make sense from that "we can do one memory access per cycle, how can we use that best".
The ARM memcpy() routines tended to all be about using those ldm/stm sequences, iirc. Yes, it was about balancing instruction fetch vs data fetch, but it's actually a good example of it not being about "single cycle" instructions. Like the 6502, it really was about the CPU basically being designed around a memory access per cycle.
And no, I was never directly an ARM user. But I was doing 6502, and like pretty much all other 6502 people I had the same disdain for the stupid slow "lots of cycles" approach that Z80 had. A 6502 at 1MHz could often perform as well as a Z80 clocked twice as fast, despite the fact that the Z80 had all those fancy 16-bit things.
And I did want an Archimedes. I went the m68k route instead for random reasons.
Linus
>
> OK. But do you take my point that all early 1980s RISC architects were facing the exact
> same problems:
I really think you are mis-stating the whole thing exactly by trying to lump those designers together as similar. They weren't. They weren't even looking for the same things.
The ARM people very much came from a 6502 background, and from a memory access background. That background already did that single-cycle pipelining that you talk up so much. It had absolutely nothing to do with "RISC" in any shape or form.
You can call 6502 "RISC" all you want, but that's just complete BS. It's rewriting history, and it's trying to make the "RISC" moniker cover something it wasn't.
The 6502 was unusual in doing a memory access every cycle. And people appreciated it, particularly compared to the Z80 (which was the most obvious direct competitor). The 6502 was cheap, simple and very limited compared to the Z80, but performed fairly well exactly because the 6502 had that limited pipelining and got a memory access every cycle.
ARM literally started as a 32-bit 6502 replacement, with the simplest possible instruction set decoding not so much out of any RISC "science", but simply because when you have a small handful of people who have never designed a CPU before, you go for simple.
> a) they wanted 1 instruction-per-cycle throughput, and that require enough pipelining
> that branches became expensive
You're really ignoring where ARM came from. They already had that "one instruction per cycle". Really. That was their existing starting point.
(Ok, so reality is actually slightly different: I think 6502 was more like "one memory access per cycle", rather than "one instruction per cycle", so two-byte instructions took two cycles, and then if an instruction made a memory access that took another cycle. A very few instructions were slower than that. And that's actually pretty much exactly what ARM was then designed to do, and the LDM/STM comes from the fact that it is the obvious way to get more data accesses per instruction fetch).
Your "pipelining and single cycle instructions" argument makes sense when look at the people who came from the m68k background (or maybe from Z80 and a CP/M background). Obviously that m68k was where Sun was coming from.
But it doesn't make sense for the ARM people. They came from a different background. They already had that "memory access every cycle" background, and they found things like the 286 or m68k to be lacking exactly because those chips did not.
> b) they wanted a massive amount of I-fetch bandwidth, so they had to take drastic steps
> (large register file, register windows or LDM/STM or an I-cache) to minimize the
> impact of data accesses on instruction-fetch.
Again, I think you lump things together because you want to make an argument, but it doesn't make sense.
ARM's LDM/STM is generally different from the other RISCs (that generally had "load pair" instructions). In many ways it's closer to the CISC instructions. The m68k "movem" instruction in particular. The ARM instructions make sense from that "we can do one memory access per cycle, how can we use that best".
The ARM memcpy() routines tended to all be about using those ldm/stm sequences, iirc. Yes, it was about balancing instruction fetch vs data fetch, but it's actually a good example of it not being about "single cycle" instructions. Like the 6502, it really was about the CPU basically being designed around a memory access per cycle.
And no, I was never directly an ARM user. But I was doing 6502, and like pretty much all other 6502 people I had the same disdain for the stupid slow "lots of cycles" approach that Z80 had. A 6502 at 1MHz could often perform as well as a Z80 clocked twice as fast, despite the fact that the Z80 had all those fancy 16-bit things.
And I did want an Archimedes. I went the m68k route instead for random reasons.
Linus