By: Gabriele Svelto (gabriele.svelto.delete@this.gmail.com), August 17, 2016 3:52 am
Room: Moderated Discussions
Etienne (etienne_lorrain.delete@this.yahoo.fr) on August 17, 2016 1:27 am wrote:
> Still, if hardware can know branch taken/not taken a bit earlier
> it should improve code speed execution, isn't it?
Yes, but it already does thanks to OOOE execution. Branch conditions are evaluated early anyway because of it on nearly all architectures. Explicitly doing so in advance is useful only in in-order machines or on ones with such a small reorder buffer that it can't really look ahead more than a handful of instructions (e.g. early PowerPC processors, etc...).
> Having flags updated by nearly every assembly instruction like ia32/amd64 forces
> the compiler to do the test as late as possible, near the conditional jump.
Yes, but the execution flow is going to be different on an x86 OOOE core because the condition codes are also renamed; in practice evaluation can happen earlier anyway.
> I do not know very well POWER, but on PPC I did not see GCC using efficiently the multiple flag
> sets, maybe there are better compiler now - so the test can be moved as early as possible.
It never did IIRC and I don't think it does now. I remember that XLC did use multiple CRs when it was beneficial but I'm not sure they bother anymore; recent POWER processors have deep pipelines, huge reorder buffers and huge branch prediction tables so I doubt it would make a measurable difference.
> Still, if hardware can know branch taken/not taken a bit earlier
> it should improve code speed execution, isn't it?
Yes, but it already does thanks to OOOE execution. Branch conditions are evaluated early anyway because of it on nearly all architectures. Explicitly doing so in advance is useful only in in-order machines or on ones with such a small reorder buffer that it can't really look ahead more than a handful of instructions (e.g. early PowerPC processors, etc...).
> Having flags updated by nearly every assembly instruction like ia32/amd64 forces
> the compiler to do the test as late as possible, near the conditional jump.
Yes, but the execution flow is going to be different on an x86 OOOE core because the condition codes are also renamed; in practice evaluation can happen earlier anyway.
> I do not know very well POWER, but on PPC I did not see GCC using efficiently the multiple flag
> sets, maybe there are better compiler now - so the test can be moved as early as possible.
It never did IIRC and I don't think it does now. I remember that XLC did use multiple CRs when it was beneficial but I'm not sure they bother anymore; recent POWER processors have deep pipelines, huge reorder buffers and huge branch prediction tables so I doubt it would make a measurable difference.