By: Patrick Chase (patrickjchase.delete@this.gmai.com), July 6, 2015 10:30 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 6, 2015 3:34 pm wrote:
> This claim is frequently made, and I must admit I don't understand why.
> Even in the simplest sort of scalar CPU, let's assume a branch misprediction cost of 20
> cycles, free correctly predicted branches, and a single cycle select. Then you only have
> to mispredict 5% of the time for the sel to be a win.
There are a couple glaring problems with your analysis:
1. You also need to include the cost to generate the predicated operand in your analysis. The whole point of branching is that you only execute the instructions on one sides of the branch (assuming accurate prediction), whereas with predication you always pay for both. The minimum common case adds an extra instruction of work, and that increases the threshold mispredict rate to 10%. It's more often a matter of a few additional instructions.
2. Selects are seldom simple single-op instructions. O(3) is more common.
> In superscalar CPUs the numbers get
> worse, especially since most of the time now the sel is likely going to be as "free" as a
> branch is.
No, it is not.
> This claim is frequently made, and I must admit I don't understand why.
> Even in the simplest sort of scalar CPU, let's assume a branch misprediction cost of 20
> cycles, free correctly predicted branches, and a single cycle select. Then you only have
> to mispredict 5% of the time for the sel to be a win.
There are a couple glaring problems with your analysis:
1. You also need to include the cost to generate the predicated operand in your analysis. The whole point of branching is that you only execute the instructions on one sides of the branch (assuming accurate prediction), whereas with predication you always pay for both. The minimum common case adds an extra instruction of work, and that increases the threshold mispredict rate to 10%. It's more often a matter of a few additional instructions.
2. Selects are seldom simple single-op instructions. O(3) is more common.
> In superscalar CPUs the numbers get
> worse, especially since most of the time now the sel is likely going to be as "free" as a
> branch is.
No, it is not.