> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on July 7, 2015 7:43 am wrote:
> > To "predicate" a store, one selects the address (either the ordinary address or a
> > safe but unused address ) and performs the store. A store is more expensive than just
> > an ALU operation, but it could still be cheaper than a branch misprediction.
>
> An instruction such as isel can be easily reproduced using a sequence of other instructions (see
> Hacker's Delight for a zillion short sequences suitable for the task) but that's not the point.
> It just costs more than using an isel (both in terms of execution time and resources spent in
> implementing it) and the same applies to making a store output to a dummy address. You need to
> setup the appropriate address (unless your processor already has one), load it to a dedicated
> register or synthesize it, etc... In the end it's more work and more cycles/power spent than just
> jumping over a store that effectively causes it to be dismissed if the jump is taken.
Yes, using a select instruction to avoid a store is more work, but if the processor does not support dynamic hammock branch predication (or predicated stores) then doing this extra work may still be less expensive than the alternative (frequent branch mispredictions flushing many instructions). When a mispredicted branch can easily discard tens of operations, sometimes even expensive alternatives make sense.
As Michael S commented earlier in this thread:
Sometimes, when direction of branch is really unpredictable and pipeline is really long and data dependencies are either non-problematic or "true" and inevitable and the processor is wide and it (processor) does not have anything better to do these people are not morons at all even when each side of branch is close to 10 instructions. There are no hard rules except "Don't trust your intuition. Measure!".
The question of what interface to provide to the compiler/programmer is not simple either. The RISC V developers specifically chose not to include conditional move/select (page 17, The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.0; PDF):
We considered but did not include conditional moves or predicated instructions, which can effectively replace unpredictable short forward branches. Conditional moves are the simpler of the two, but are difficult to use with conditional code that might cause exceptions (memory accesses and floating-point operations). Predication adds additional flag state to a system, additional instructions to set and clear flags, and additional encoding overhead on every instruction. Both conditional move and predicated instructions add complexity to out-of-order microarchitectures, adding an implicit third source operand due to the need to copy the original value of the destination architectural register into the renamed destination physical register if the predicate is false. Also, static compile-time decisions to use predication instead of branches can result in lower performance on inputs not included in the compiler training set, especially given that unpredictable branches are rare, and becoming rarer as branch prediction techniques improve.
We note that various microarchitectural techniques exist to dynamically convert unpredictable short forward branches into internally predicated code to avoid the cost of flushing pipelines on a branch mispredict [7, 11, 10] and have been implemented in commercial processors [20]. The simplest techniques just reduce the penalty of recovering from a mispredicted short forward branch by only flushing instructions in the branch shadow instead of the entire fetch pipeline, or by fetching instructions from both sides using wide instruction fetch or idle instruction fetch slots. More complex techniques for out-of-order cores add internal predicates on instructions in the branch shadow, with the internal predicate value written by the branch instruction, allowing the branch and following instructions to be executed speculatively and out-of-order with respect to other code [20].
I am inclined to disagree, but I also know that I am inordinately fond of software communicating information to hardware.