By: RichardC (tich.delete@this.pobox.com), April 4, 2017 11:58 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on April 4, 2017 11:10 am wrote:
> I would dispute that ARM was designed to RISC goals - more to "simplicity" than anything else.
>
> ARM ended up being RISC, but not so much because of any religion (or even much science).
> Simply due to the designers being constrained, rather than aiming for reducing anything.
>
> For example, you wouldn't describe the 6502 much as "RISC", would you? It was simple, yes.
> It was fairly straightforward, yes. But it was more of a design constraint than a science.
Hell yes! And so would the ARM designers, who were expert 6502 assembler programmers and huge fans of the 6502. Wilson and Furber had read the Berkely RISC and MIPS-I stuff,
and knew they *could* design a 32bit cpu with reasonable effort, and since they had deep
experience with 6502-based hardware and software they were also heavily influenced by
the 6502.
The RISC-like features of 6502 are: few instruction formats, simple decode and control
logic, instructions execute in as few cycles as possible.
> The main ARM design seems to have been based around RAM access rather than
> "RISC philosophy", and the simplicity came from design constraints.
I think this whole idea that there was a "RISC philosophy" or a "RISC religion" in the
early 1980s is utterly bogus. There were many teams playing around with new ISAs because
a) process technology had got good enough to fit a small cpu (but not a big one) on a
chip
b) Mead/Conway's "VLSI Design" provided a roadmap for doing that kind of design with a
manageably small amount of EDA software (instead of drawing transistors by hand on mylar
sheets ...).
c) Retargetable optimizing compilers gave you a chance of getting a decent amount of
software running reasonably well on a brand-new ISA.
Those various teams all shared several of the same constraints as ARM-1 - the cpu had to fit on a single chip which could yield well on a foundry process; it had to go *much*
faster than a VAX or 8086/80186 to be worthwhile; and the whole hardware/software effort had to be manageable with a small number of people working for up to about 18 months.
Not surprisingly, given similar constraints they arrived at similar solutions - a 32bit
ALU, a reasonably big file of 32bit general-purpose registers, fixed-size 32bit
instructions, few instruction formats, simple instruction decode, 3-address register-register instructions.
That wasn't "religion" or "philosophy" - it was what was going to work within the
technological constraints of the time. Where the different efforts had different skills
and goals, they diverged to some extent - MIPS arguably had the best software/compiler
expertise, and they focused on simplifying the hardware and moving complexity into
software; the ARM team wanted to evolve from their 6502-based hardware and 6502-assembler software ecosystem, so they emphasized fast interrupt latency, and a human-friendly instruction set (no delayed branches ...). Berkeley RISC was about running single-threaded
C benchmarks fast, so they put in register-window hardware to speed up procedure calls.
Everyone knew what they were doing. Everyone had reasons for what they doing. And
everyone was gathering as much actual data as they could from execution traces and
simulations to guide their choices. And it worked. Really well.
I think everyone needs to understand this point about the history of technology: people
in every era are damn smart, and the best of them are making the best decisions possible within the constraints of the information and technology available to them at that time.
The common core of RISC wasn't adopted because it was a religion; it was widely adopted
because it was close to optimal for the implementation technology of the early 1980s.
In the same era, a lot of effort was going into implementation of various CISC'y
ISAs. And they sucked relative to the RISC'ier designs: you ended up with a big pile of
complex slow control logic, narrow ALUs and datapaths and memory buses, and instructions
which needed many cycles to complete. Blech.
> And the whole "keep it simple" has always been a successful design strategy. Over time, it always expands,
> of course, so it's generally mainly successful as a starting point. The barnacles come later.
There isn't much "keep it simple" about PentiumPro. But it was fantastic nevertheless.
The point about RISC was that "Keep It Simple" was absolutely the right thing to do if
it allowed you to squeeze onto a single chip and have fast wide on-chip buses between your
3-ported register file and your ALUs, instead of having to split across multiple chips
(either by bitslice of functional decomposition and multiplex across narrow slow off-chip
communication paths).
> Arguably Power comes more from a RISC thinking, ie the IBM 801 and real design and
> "look at instruction traces and what actually matters". Together with actually having
> the know-how to do a complex chip, and consciously aiming for something simpler.
I don't know why you imagine that the Berkely RISC and Stanford MIPS and Acorn ARM teams
*didn't* do that kind of quantitative analysis. They did. And Hennessy & Patterson wrote
*the* frickin' book on "Quantitative Computer Architecture", so ...
> But the RISC "religion" came later. Sparc and particularly MIPS both made actual (bad) design choices
> in the name of "we can make simpler crap". Some of it was really horrid crap, like the already mentioned
> lack of interlocking on MIPS and the branch delay slots. Huge huge architectural mistakes.
>
> There was no "science" behind those. It was just pure bad taste.
*Everyone* makes mistakes. The fact that SPARC and MIPS are still going into various
products right now, 30 years after their initial release, should suggest to a rational
person that they weren't *terrible* mistakes (compared to say, iAPX-432, or i860).
And I disagree with you about branch delay slots. If you're looking to fit a whole
CPU into 30K transistors or so, and pipeline stalls on every branch would cripple your
performance, then you have to do *something*. And delayed branch is an easy thing to do,
and not all that harmful, since it turns out the optimizer can very often put something
useful in the delay slot.
Once you have a much bigger transistor budget, you put in branch predictors and the problem
mostly goes away. But waiting 10 years for a bigger transistor budget is not an option.
> I would dispute that ARM was designed to RISC goals - more to "simplicity" than anything else.
>
> ARM ended up being RISC, but not so much because of any religion (or even much science).
> Simply due to the designers being constrained, rather than aiming for reducing anything.
>
> For example, you wouldn't describe the 6502 much as "RISC", would you? It was simple, yes.
> It was fairly straightforward, yes. But it was more of a design constraint than a science.
Hell yes! And so would the ARM designers, who were expert 6502 assembler programmers and huge fans of the 6502. Wilson and Furber had read the Berkely RISC and MIPS-I stuff,
and knew they *could* design a 32bit cpu with reasonable effort, and since they had deep
experience with 6502-based hardware and software they were also heavily influenced by
the 6502.
The RISC-like features of 6502 are: few instruction formats, simple decode and control
logic, instructions execute in as few cycles as possible.
> The main ARM design seems to have been based around RAM access rather than
> "RISC philosophy", and the simplicity came from design constraints.
I think this whole idea that there was a "RISC philosophy" or a "RISC religion" in the
early 1980s is utterly bogus. There were many teams playing around with new ISAs because
a) process technology had got good enough to fit a small cpu (but not a big one) on a
chip
b) Mead/Conway's "VLSI Design" provided a roadmap for doing that kind of design with a
manageably small amount of EDA software (instead of drawing transistors by hand on mylar
sheets ...).
c) Retargetable optimizing compilers gave you a chance of getting a decent amount of
software running reasonably well on a brand-new ISA.
Those various teams all shared several of the same constraints as ARM-1 - the cpu had to fit on a single chip which could yield well on a foundry process; it had to go *much*
faster than a VAX or 8086/80186 to be worthwhile; and the whole hardware/software effort had to be manageable with a small number of people working for up to about 18 months.
Not surprisingly, given similar constraints they arrived at similar solutions - a 32bit
ALU, a reasonably big file of 32bit general-purpose registers, fixed-size 32bit
instructions, few instruction formats, simple instruction decode, 3-address register-register instructions.
That wasn't "religion" or "philosophy" - it was what was going to work within the
technological constraints of the time. Where the different efforts had different skills
and goals, they diverged to some extent - MIPS arguably had the best software/compiler
expertise, and they focused on simplifying the hardware and moving complexity into
software; the ARM team wanted to evolve from their 6502-based hardware and 6502-assembler software ecosystem, so they emphasized fast interrupt latency, and a human-friendly instruction set (no delayed branches ...). Berkeley RISC was about running single-threaded
C benchmarks fast, so they put in register-window hardware to speed up procedure calls.
Everyone knew what they were doing. Everyone had reasons for what they doing. And
everyone was gathering as much actual data as they could from execution traces and
simulations to guide their choices. And it worked. Really well.
I think everyone needs to understand this point about the history of technology: people
in every era are damn smart, and the best of them are making the best decisions possible within the constraints of the information and technology available to them at that time.
The common core of RISC wasn't adopted because it was a religion; it was widely adopted
because it was close to optimal for the implementation technology of the early 1980s.
In the same era, a lot of effort was going into implementation of various CISC'y
ISAs. And they sucked relative to the RISC'ier designs: you ended up with a big pile of
complex slow control logic, narrow ALUs and datapaths and memory buses, and instructions
which needed many cycles to complete. Blech.
> And the whole "keep it simple" has always been a successful design strategy. Over time, it always expands,
> of course, so it's generally mainly successful as a starting point. The barnacles come later.
There isn't much "keep it simple" about PentiumPro. But it was fantastic nevertheless.
The point about RISC was that "Keep It Simple" was absolutely the right thing to do if
it allowed you to squeeze onto a single chip and have fast wide on-chip buses between your
3-ported register file and your ALUs, instead of having to split across multiple chips
(either by bitslice of functional decomposition and multiplex across narrow slow off-chip
communication paths).
> Arguably Power comes more from a RISC thinking, ie the IBM 801 and real design and
> "look at instruction traces and what actually matters". Together with actually having
> the know-how to do a complex chip, and consciously aiming for something simpler.
I don't know why you imagine that the Berkely RISC and Stanford MIPS and Acorn ARM teams
*didn't* do that kind of quantitative analysis. They did. And Hennessy & Patterson wrote
*the* frickin' book on "Quantitative Computer Architecture", so ...
> But the RISC "religion" came later. Sparc and particularly MIPS both made actual (bad) design choices
> in the name of "we can make simpler crap". Some of it was really horrid crap, like the already mentioned
> lack of interlocking on MIPS and the branch delay slots. Huge huge architectural mistakes.
>
> There was no "science" behind those. It was just pure bad taste.
*Everyone* makes mistakes. The fact that SPARC and MIPS are still going into various
products right now, 30 years after their initial release, should suggest to a rational
person that they weren't *terrible* mistakes (compared to say, iAPX-432, or i860).
And I disagree with you about branch delay slots. If you're looking to fit a whole
CPU into 30K transistors or so, and pipeline stalls on every branch would cripple your
performance, then you have to do *something*. And delayed branch is an easy thing to do,
and not all that harmful, since it turns out the optimizer can very often put something
useful in the delay slot.
Once you have a much bigger transistor budget, you put in branch predictors and the problem
mostly goes away. But waiting 10 years for a bigger transistor budget is not an option.