By: Travis (travis.downs.delete@this.gmail.com), August 9, 2016 9:44 am
Room: Moderated Discussions
It is pretty clear that high-performance implementations of variable-length instruction encodings (x86 being the poster child) have settled on a branch direction predictor + branch target predictor to allowing instruction fetch to (speculatively) follow branches and jumps. Here I'm mostly interested in the second half of that pair - the target predictor (hereafter "BTB").
The BTB is interesting because, as I understand it, it used not only for conditional branches, but also for all unconditional jumps and calls, since those too change the fetch flow of instruction fetch [1]. On a variable length architecture, the BTB is pretty much required to avoid i-fetch bubbles when jumps are encountered.
A question that was raised, however, is about fixed length archs - can these architectures avoid use of BTB entries for fixed-target branches by decoding such jumps early and redirecting fetch? That is, avoiding the use of the BTB for branches whose targets are fixed in the instruction, leaving the BTB resources for branches which may actually vary (e.g., indirect jumps). Do any of the common fixed-length archs actually do this?
[1] In some cases I supposed unconditional jumps may disappear when a post-decode cache instruction is used that simply stores the trace with unconditional jumps elided. I'm pretty sure Netburst does this, but I don't remember if modern Intel uop cache does it?
The BTB is interesting because, as I understand it, it used not only for conditional branches, but also for all unconditional jumps and calls, since those too change the fetch flow of instruction fetch [1]. On a variable length architecture, the BTB is pretty much required to avoid i-fetch bubbles when jumps are encountered.
A question that was raised, however, is about fixed length archs - can these architectures avoid use of BTB entries for fixed-target branches by decoding such jumps early and redirecting fetch? That is, avoiding the use of the BTB for branches whose targets are fixed in the instruction, leaving the BTB resources for branches which may actually vary (e.g., indirect jumps). Do any of the common fixed-length archs actually do this?
[1] In some cases I supposed unconditional jumps may disappear when a post-decode cache instruction is used that simply stores the trace with unconditional jumps elided. I'm pretty sure Netburst does this, but I don't remember if modern Intel uop cache does it?