By: none (none.delete@this.none.com), February 2, 2013 10:43 am
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 10:11 am wrote:
> none (none.delete@this.none.com) on February 2, 2013 5:33 am wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> > [...]
> > > [*] Yes, I do realize that the A9 is OoO. It's capabilities in that regard are so limited that one wonders
> > > why they bothered, though. Embedded workloads are typically
> > > recompiled and optimized for each product and often
> > > use explicit prefetch to "expose" cache misses, and both of those tend to reduce the advantage of OoO.
> >
> > Perhaps they bothered because even limited OoO brought ~20% performance over A8 for a very low cost?
> > Also A9 is used in environments where recompilation doesn't happen in case you didn't notice ;)
>
> I agree that A9 is 20% faster than A8, but there are other factors at play besides just OoO.
> A9's design has been optimized quite a bit more than A8s', and it has a somewhat different memory
> subsystem (though both memory subsystems have serious limitations). The fair comparison would
> be to ask what would have happened if they'd made similar improvements to the A8.
>
> The A9s ROB is 24 entries, which means that its speculation/reorder window is *very* limited. The
> ROB_size/issue_rate ratio (basically a metric of how many clocks of instructions the ROB can hold
> at full issue rate) is only 12, which isn't even enough to "cover" an L1 miss that hits L2.
That 24 entries figure is wrong and anyway A9 has no ROB as found in other OoO CPU :)
Anyway what I think matters is that A9 was well balanced both on the core and the data side. And the mistake of a non-pipelined FPU was not remade...
> none (none.delete@this.none.com) on February 2, 2013 5:33 am wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> > [...]
> > > [*] Yes, I do realize that the A9 is OoO. It's capabilities in that regard are so limited that one wonders
> > > why they bothered, though. Embedded workloads are typically
> > > recompiled and optimized for each product and often
> > > use explicit prefetch to "expose" cache misses, and both of those tend to reduce the advantage of OoO.
> >
> > Perhaps they bothered because even limited OoO brought ~20% performance over A8 for a very low cost?
> > Also A9 is used in environments where recompilation doesn't happen in case you didn't notice ;)
>
> I agree that A9 is 20% faster than A8, but there are other factors at play besides just OoO.
> A9's design has been optimized quite a bit more than A8s', and it has a somewhat different memory
> subsystem (though both memory subsystems have serious limitations). The fair comparison would
> be to ask what would have happened if they'd made similar improvements to the A8.
>
> The A9s ROB is 24 entries, which means that its speculation/reorder window is *very* limited. The
> ROB_size/issue_rate ratio (basically a metric of how many clocks of instructions the ROB can hold
> at full issue rate) is only 12, which isn't even enough to "cover" an L1 miss that hits L2.
That 24 entries figure is wrong and anyway A9 has no ROB as found in other OoO CPU :)
Anyway what I think matters is that A9 was well balanced both on the core and the data side. And the mistake of a non-pipelined FPU was not remade...