By: none (none.delete@this.none.com), February 4, 2013 3:03 am
Room: Moderated Discussions
Jouni Osmala (josmala.delete@this.cc.hut.fi) on February 4, 2013 1:07 am wrote:
> > Patrick's point, which I agree with, is that the x86 penalty really depends a lot on
> > context. I think that for a scalar core, it's probably more than 15%. But for something
> > like the P3, it's a lot less. That's doubly true once you start talking about caches
> > in the range of 1MB/core. At that point, the x86 penalty is really quite small.
> >
> > And it's a fair point, but one I didn't want to dive into
> > because of the inherent complexity. But it's true,
> > the x86 overhead depends on the performance of the core; the higher performance the lower the overhead.
>
> Unfortunately we are not having alphas around anymore to show that in practice.
> But key issue is this, once we are in high enough performance situation, we start wondering
> what we could do to improve it even further. The branch misprediction penalty is one of key things to limit
> that. And x86 increases the pipeline length. Then the increased register pressure's effect on how many memory
> subsystem operations you need at given performance level. Of course intel has spend hardware there to fix
> performance problems, hardware that consumes power and that if applied to riscier system could instead of
> filling buffers with superfluous memory operations could spend them with operations inherent in problem.
> Then there is just penalty of having far wider micro-ops stored in OoO structures compared to RISCier designs.
> And renaming of condition codes still require hardware that needs to be active.
> The problem isn't how much die area they spend anymore, its where the extra die area adds
> latency or consumes power, and that's where x86 penalty really comes in these days.
And there also is the validation nightmare that this increased complexity requires. As an example, microcode was mentioned in another thread as something quite complex.
> > Patrick's point, which I agree with, is that the x86 penalty really depends a lot on
> > context. I think that for a scalar core, it's probably more than 15%. But for something
> > like the P3, it's a lot less. That's doubly true once you start talking about caches
> > in the range of 1MB/core. At that point, the x86 penalty is really quite small.
> >
> > And it's a fair point, but one I didn't want to dive into
> > because of the inherent complexity. But it's true,
> > the x86 overhead depends on the performance of the core; the higher performance the lower the overhead.
>
> Unfortunately we are not having alphas around anymore to show that in practice.
> But key issue is this, once we are in high enough performance situation, we start wondering
> what we could do to improve it even further. The branch misprediction penalty is one of key things to limit
> that. And x86 increases the pipeline length. Then the increased register pressure's effect on how many memory
> subsystem operations you need at given performance level. Of course intel has spend hardware there to fix
> performance problems, hardware that consumes power and that if applied to riscier system could instead of
> filling buffers with superfluous memory operations could spend them with operations inherent in problem.
> Then there is just penalty of having far wider micro-ops stored in OoO structures compared to RISCier designs.
> And renaming of condition codes still require hardware that needs to be active.
> The problem isn't how much die area they spend anymore, its where the extra die area adds
> latency or consumes power, and that's where x86 penalty really comes in these days.
And there also is the validation nightmare that this increased complexity requires. As an example, microcode was mentioned in another thread as something quite complex.