By: EduardoS (no.delete@this.spam.com), July 6, 2015 7:56 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 6, 2015 3:34 pm wrote:
> This claim is frequently made, and I must admit I don't understand why.
> Even in the simplest sort of scalar CPU, let's assume a branch misprediction cost of 20 cycles,
> free correctly predicted branches, and a single cycle select. Then you only have to mispredict
> >5% of the time for the sel to be a win. In superscalar CPUs the numbers get worse, especially
> since most of the time now the sel is likely going to be as "free" as a branch is.
> The sel is going to have constant small background cost in terms of tying up one more
> register for each sel sitting in the ROB, but that seems unlikely to be a big cost.
The biggest cost of the sel instruction is not the single cycle latency, but the 3-way dependency introduced, depending on the dependency chain a mis-predicted branch may actually be faster than the sel instruction, while the correctly predicted branch will be faster, and maybe, a lot faster.
> This claim is frequently made, and I must admit I don't understand why.
> Even in the simplest sort of scalar CPU, let's assume a branch misprediction cost of 20 cycles,
> free correctly predicted branches, and a single cycle select. Then you only have to mispredict
> >5% of the time for the sel to be a win. In superscalar CPUs the numbers get worse, especially
> since most of the time now the sel is likely going to be as "free" as a branch is.
> The sel is going to have constant small background cost in terms of tying up one more
> register for each sel sitting in the ROB, but that seems unlikely to be a big cost.
The biggest cost of the sel instruction is not the single cycle latency, but the 3-way dependency introduced, depending on the dependency chain a mis-predicted branch may actually be faster than the sel instruction, while the correctly predicted branch will be faster, and maybe, a lot faster.