By: none (none.delete@this.none.com), February 4, 2013 4:12 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 2:01 pm wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 1:13 pm wrote:
> > none (none.delete@this.none.com) on February 2, 2013 10:43 am wrote:
> > > That 24 entries figure is wrong and anyway A9 has no ROB as found in other OoO CPU :)
> >
> > ARM describes the A9 as doing "out of order issue". The TRM further describes it as using register
> > renaming to resolve WAW/WAR hazards without stalling, which implies a Tomasulo machine or similar.
> > The ARM ISA requires precise exceptions. I've developed OS code including fault handlers and
> > context-switching for A9, so I *know* it implements precise exceptions.
> >
> > I'm not aware of a means of implementing out-of-order issue with non-stalling resolution of WAR/WAW
> > and precise exceptions without some structure equivalent to an ROB in function if not in name.
> > I'm aware of the ARM slideset that says that A9 does OoO "without a power-hungry ROB" but I suspect
> > that's misworded and simply means that they used a physical register file to avoid Tomasulo's
> > reservation stations and common results bus (just like Sandy/Ivy Bridge and many other recent
> > OoO microarchitectures). The ROB itself isn't particularly power-hungry.
> >
> > Can you explain how A9 achieves out-of-order issue with renaming
> > and precise exceptions without an ROB or equivalent?
>
> Sorry to follow-up my own post again, but...
>
> I did some digging and found multiple sources that refer to a 24-entry "data-less
> ROB" in A9. Unfortunately ARM has pulled down the original documents (particularly
> the devcon 2007 A9 architecture slides) that those sources refer to.
The 24-entry comes from people believing that 56 physical registers for 32 architectural ones means 24 entries :-)
> That tends to reinforce what I hypothesized above: It uses a PRF instead of reservation stations and
> a common results bus, so the ROB only needs to track instruction order and state (speculative or not,
> written back to PRF or not) as opposed to instruction results Hence "data-less", just like Sandy Bridge,
> Bobcat, and a whole lot of other modern OoO microarchitectures :-). It's still an ROB, though.
A very very simple one then, which is what I meant ;)
> Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 1:13 pm wrote:
> > none (none.delete@this.none.com) on February 2, 2013 10:43 am wrote:
> > > That 24 entries figure is wrong and anyway A9 has no ROB as found in other OoO CPU :)
> >
> > ARM describes the A9 as doing "out of order issue". The TRM further describes it as using register
> > renaming to resolve WAW/WAR hazards without stalling, which implies a Tomasulo machine or similar.
> > The ARM ISA requires precise exceptions. I've developed OS code including fault handlers and
> > context-switching for A9, so I *know* it implements precise exceptions.
> >
> > I'm not aware of a means of implementing out-of-order issue with non-stalling resolution of WAR/WAW
> > and precise exceptions without some structure equivalent to an ROB in function if not in name.
> > I'm aware of the ARM slideset that says that A9 does OoO "without a power-hungry ROB" but I suspect
> > that's misworded and simply means that they used a physical register file to avoid Tomasulo's
> > reservation stations and common results bus (just like Sandy/Ivy Bridge and many other recent
> > OoO microarchitectures). The ROB itself isn't particularly power-hungry.
> >
> > Can you explain how A9 achieves out-of-order issue with renaming
> > and precise exceptions without an ROB or equivalent?
>
> Sorry to follow-up my own post again, but...
>
> I did some digging and found multiple sources that refer to a 24-entry "data-less
> ROB" in A9. Unfortunately ARM has pulled down the original documents (particularly
> the devcon 2007 A9 architecture slides) that those sources refer to.
The 24-entry comes from people believing that 56 physical registers for 32 architectural ones means 24 entries :-)
> That tends to reinforce what I hypothesized above: It uses a PRF instead of reservation stations and
> a common results bus, so the ROB only needs to track instruction order and state (speculative or not,
> written back to PRF or not) as opposed to instruction results Hence "data-less", just like Sandy Bridge,
> Bobcat, and a whole lot of other modern OoO microarchitectures :-). It's still an ROB, though.
A very very simple one then, which is what I meant ;)