By: Patrick Chase (patrickjchase.delete@this.gmail.com), February 2, 2013 9:11 am
Room: Moderated Discussions
none (none.delete@this.none.com) on February 2, 2013 5:33 am wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> [...]
> > [*] Yes, I do realize that the A9 is OoO. It's capabilities in that regard are so limited that one wonders
> > why they bothered, though. Embedded workloads are typically
> > recompiled and optimized for each product and often
> > use explicit prefetch to "expose" cache misses, and both of those tend to reduce the advantage of OoO.
>
> Perhaps they bothered because even limited OoO brought ~20% performance over A8 for a very low cost?
> Also A9 is used in environments where recompilation doesn't happen in case you didn't notice ;)
I agree that A9 is 20% faster than A8, but there are other factors at play besides just OoO. A9's design has been optimized quite a bit more than A8s', and it has a somewhat different memory subsystem (though both memory subsystems have serious limitations). The fair comparison would be to ask what would have happened if they'd made similar improvements to the A8.
The A9s ROB is 24 entries, which means that its speculation/reorder window is *very* limited. The ROB_size/issue_rate ratio (basically a metric of how many clocks of instructions the ROB can hold at full issue rate) is only 12, which isn't even enough to "cover" an L1 miss that hits L2.
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> [...]
> > [*] Yes, I do realize that the A9 is OoO. It's capabilities in that regard are so limited that one wonders
> > why they bothered, though. Embedded workloads are typically
> > recompiled and optimized for each product and often
> > use explicit prefetch to "expose" cache misses, and both of those tend to reduce the advantage of OoO.
>
> Perhaps they bothered because even limited OoO brought ~20% performance over A8 for a very low cost?
> Also A9 is used in environments where recompilation doesn't happen in case you didn't notice ;)
I agree that A9 is 20% faster than A8, but there are other factors at play besides just OoO. A9's design has been optimized quite a bit more than A8s', and it has a somewhat different memory subsystem (though both memory subsystems have serious limitations). The fair comparison would be to ask what would have happened if they'd made similar improvements to the A8.
The A9s ROB is 24 entries, which means that its speculation/reorder window is *very* limited. The ROB_size/issue_rate ratio (basically a metric of how many clocks of instructions the ROB can hold at full issue rate) is only 12, which isn't even enough to "cover" an L1 miss that hits L2.