By: Ricardo B (ricardo.b.delete@this.xxxxx.xx), November 25, 2014 4:34 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on November 24, 2014 6:17 pm wrote:
> Mark Roulo (nothanks.delete@this.xxx.com) on November 24, 2014 4:17 pm wrote:
> >
> > John Mashey post that is relevant:
>
> Yes, John's analysis comes up every time this is discussed, but that analysis
> is still very much in the area of predecode and pipelining the end result.
>
> Which may not be really all that relevant, once you get past the next complexity hump. Why would you ever pipeline
> the complex instructions with complicated addressing modes? That would indeed be painful and horrid.
>
> But once you go beyond a traditional pipelined architecture with a fairly tightly coupled front-end feeding
> a pipeline that looks like individual instruction, and instead go to a uop-based OoO approach that makes
> the front end be much less tightly coupled from the back end, things don't look nearly as bad.
>
> Just walk through John's example of a "ADDL @(R1)+,@(R1)+,@(R2)+".
>
> That kind of instruction looks pretty simple to do if you just do it one
> instruction at a time, completely mindlessly. Like VAX used to do it.
>
> And then John makes the argument that it's painful like hell to do in a pipelined environment, because all those
> stages can fault etc etc. And that was the next stage from that "do one instruction at a time" mentality.
>
> But but instead of thinking of it like a pipelined execution engine (and how exceptions have to undo all
> the updates etc), go one technological update further, and think of the registers as renamed, each load and
> address update (and the final store) as independent uops, and an execution engine with queues of hundreds
> of ops in flight. And suddenly it doesn't sound so fundamentally horrid any more - any OoO engine already
> has to have all the rollback capability, and having multiple uops doesn't really change any of that.
>
> Yes, the "decode one instruction to seven uops" (two loads, one store, three register
> updates, one add, or whatever) sounds bad, but that's still a fairly mindless
> expansion. That doesn't sound "complex", it just sounds like work, no?
>
> Would it be worth it to resurrect the VAX instruction set? Obviously not. But I think the argument
> in this thread was that once you get over a certain stage in technology, the "that's practically unimplementable"
> part goes away. The VAX instruction set isn't really amenable to straightforward instruction pipelining,
> no. But that doesn't mean that it isn't amenable to more modern techniques..
The problem is that you've skipped a few evolutionary steps there.
Sure, it may be feasible with an OoO engine like the one found in a modern x86.
But it took a couple of decades to get there.
First, note that all of breaking down complex instructions into uOPs was not invented with the PentiumPro.
Micro-coding was a standard practice among CISC CPU designers, because it was the only way to manage the complexity.
So, assume John Mashey was quite familiar with the concept.
IMHO, the problem he was trying to address isn't "can we make a pipelined VAX", but it was "can we make a (pipelined) VAX CPU that customers will want?"
"now" was time when in-order CPUs were state of the art.
"a CPU which customers want" is something that doesn't impose horrible performance penalties on instructions that decode to more than 1 uOP and doesn't require the customers to recompile/rewrite the software to perform well.
In x86 land, the 486 and Pentium favoured heavily the use of simple RISC-like instructions.
Even with OoO, from the PentiumPro to Banias, we lived with severe restrictions in the use of less simple instructions.
> Mark Roulo (nothanks.delete@this.xxx.com) on November 24, 2014 4:17 pm wrote:
> >
> > John Mashey post that is relevant:
>
> Yes, John's analysis comes up every time this is discussed, but that analysis
> is still very much in the area of predecode and pipelining the end result.
>
> Which may not be really all that relevant, once you get past the next complexity hump. Why would you ever pipeline
> the complex instructions with complicated addressing modes? That would indeed be painful and horrid.
>
> But once you go beyond a traditional pipelined architecture with a fairly tightly coupled front-end feeding
> a pipeline that looks like individual instruction, and instead go to a uop-based OoO approach that makes
> the front end be much less tightly coupled from the back end, things don't look nearly as bad.
>
> Just walk through John's example of a "ADDL @(R1)+,@(R1)+,@(R2)+".
>
> That kind of instruction looks pretty simple to do if you just do it one
> instruction at a time, completely mindlessly. Like VAX used to do it.
>
> And then John makes the argument that it's painful like hell to do in a pipelined environment, because all those
> stages can fault etc etc. And that was the next stage from that "do one instruction at a time" mentality.
>
> But but instead of thinking of it like a pipelined execution engine (and how exceptions have to undo all
> the updates etc), go one technological update further, and think of the registers as renamed, each load and
> address update (and the final store) as independent uops, and an execution engine with queues of hundreds
> of ops in flight. And suddenly it doesn't sound so fundamentally horrid any more - any OoO engine already
> has to have all the rollback capability, and having multiple uops doesn't really change any of that.
>
> Yes, the "decode one instruction to seven uops" (two loads, one store, three register
> updates, one add, or whatever) sounds bad, but that's still a fairly mindless
> expansion. That doesn't sound "complex", it just sounds like work, no?
>
> Would it be worth it to resurrect the VAX instruction set? Obviously not. But I think the argument
> in this thread was that once you get over a certain stage in technology, the "that's practically unimplementable"
> part goes away. The VAX instruction set isn't really amenable to straightforward instruction pipelining,
> no. But that doesn't mean that it isn't amenable to more modern techniques..
The problem is that you've skipped a few evolutionary steps there.
Sure, it may be feasible with an OoO engine like the one found in a modern x86.
But it took a couple of decades to get there.
First, note that all of breaking down complex instructions into uOPs was not invented with the PentiumPro.
Micro-coding was a standard practice among CISC CPU designers, because it was the only way to manage the complexity.
So, assume John Mashey was quite familiar with the concept.
IMHO, the problem he was trying to address isn't "can we make a pipelined VAX", but it was "can we make a (pipelined) VAX CPU that customers will want?"
"now" was time when in-order CPUs were state of the art.
"a CPU which customers want" is something that doesn't impose horrible performance penalties on instructions that decode to more than 1 uOP and doesn't require the customers to recompile/rewrite the software to perform well.
In x86 land, the 486 and Pentium favoured heavily the use of simple RISC-like instructions.
Even with OoO, from the PentiumPro to Banias, we lived with severe restrictions in the use of less simple instructions.