By: Dan Fay (danielDOTfay.email@example.com), November 12, 2014 12:38 pm
Room: Moderated Discussions
> But this just raises the question, why did they do the trace cache at all? I think a single-wide x86
> decoder on a high(er) latency path must have looked mighty attractive to take such a big risk.
From my admittedly rusty understanding of trace caches, they provided the following benefits on the Netburst architectures:
1. Took x86 decode out of the critical path. This reduced branch mispredict penalties as well as allowed for a slower, simpler, potentially lower-power x86 decoder while still maintaining sufficient fetch bandwidth.
2. By storing the decoded instructions as traces, they could essentially store/cache branch predictions. Doing so allowed for powerful-but-slow branch predictors to be taken out of the critical execution path.
3. Get some reuse of x86 instruction decoding. Decode once, and execute the decoded instructions hopefully multiple times.
IIRC, the old trace cache academic papers pushed trace caches mainly as an optimization to provide high instruction fetch bandwidth and to cache/store ahead of time branch predictions. Storing decoded uOPs was really an Intel design decision that was mostly orthogonal to a trace cache.