By: none (none.delete@this.none.com), February 4, 2013 4:58 pm
Room: Moderated Discussions
Patrick Chase (patrickjchase.delete@this.gmail.com) on February 4, 2013 4:43 pm wrote:
> none (none.delete@this.none.com) on February 4, 2013 4:12 pm wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 2:01 pm wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 1:13 pm wrote:
> > > > none (none.delete@this.none.com) on February 2, 2013 10:43 am wrote:
> > > > > That 24 entries figure is wrong and anyway A9 has no ROB as found in other OoO CPU :)
> > > >
> > > > ARM describes the A9 as doing "out of order issue". The TRM further describes it as using register
> > > > renaming to resolve WAW/WAR hazards without stalling, which implies a Tomasulo machine or similar.
> > > > The ARM ISA requires precise exceptions. I've developed OS code including fault handlers and
> > > > context-switching for A9, so I *know* it implements precise exceptions.
> > > >
> > > > I'm not aware of a means of implementing out-of-order issue with non-stalling resolution of WAR/WAW
> > > > and precise exceptions without some structure equivalent to an ROB in function if not in name.
> > > > I'm aware of the ARM slideset that says that A9 does OoO "without a power-hungry ROB" but I suspect
> > > > that's misworded and simply means that they used a physical register file to avoid Tomasulo's
> > > > reservation stations and common results bus (just like Sandy/Ivy Bridge and many other recent
> > > > OoO microarchitectures). The ROB itself isn't particularly power-hungry.
> > > >
> > > > Can you explain how A9 achieves out-of-order issue with renaming
> > > > and precise exceptions without an ROB or equivalent?
> > >
> > > Sorry to follow-up my own post again, but...
> > >
> > > I did some digging and found multiple sources that refer to a 24-entry "data-less
> > > ROB" in A9. Unfortunately ARM has pulled down the original documents (particularly
> > > the devcon 2007 A9 architecture slides) that those sources refer to.
> >
> > The 24-entry comes from people believing that 56 physical
> > registers for 32 architectural ones means 24 entries :-)
> >
> > > That tends to reinforce what I hypothesized above: It uses a PRF instead of reservation stations and
> > > a common results bus, so the ROB only needs to track instruction order and state (speculative or not,
> > > written back to PRF or not) as opposed to instruction results Hence "data-less", just like Sandy Bridge,
> > > Bobcat, and a whole lot of other modern OoO microarchitectures :-). It's still an ROB, though.
> >
> > A very very simple one then, which is what I meant ;)
>
> So then why did you say "the A9 has no ROB"?
Because it's very simple and a ROB for me is something more complex. I probably slightly exagerated.
> There have been many microarchitectures with this scheme (PRF + dataless ROB), going back to the MIPS R10000
> and Alpha 21264, and probably before that. The convention in the literature has always been to refer to those
> structures as "reorder buffers", and perhaps clarify that they are dataless. See for example figure 5 here:
>
> http://www.ecs.umass.edu/ece/koren/ece568/papers/Pentium4.pdf
>
> I suspect that the reason for keeping the "ROB" terminology is because reorder buffers were devised
> to implement precise exceptions in the basic Tomasulo architecture. Whether they also store results
> or not is secondary to that basic function. See for example the seminal paper on the topic:
>
> http://dl.acm.org/citation.cfm?id=327125, also available for download
> here: http://lmi17.cnam.fr/~anceau/Documents/smith.pdf
>
> While ARM may have claimed that the A9 was "ROB-less" to make it look new and revolutionary, they
> merely made themselves look silly. To my knowledge (which is quire fallible - I'd love to be corrected
> on this one) you can't efficiently do precise exceptions in an OoO machine [*] without an ordered
> list of pending instructions and their status, and that is by definition a reorder buffer.
>
> So, now that we've established conclusively that A9 has an ROB, how many entries do you think it has?
>
> [*] I can conceive of some very inefficient mechanisms, but those basically
> come down to reconstructing the equivalent of an ROB after the fact...
Try to look for patents that were issued to ARM between 2007 and 2010 that talk about renaming. I can't give more information and can't even link patents :)
> none (none.delete@this.none.com) on February 4, 2013 4:12 pm wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 2:01 pm wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 2, 2013 1:13 pm wrote:
> > > > none (none.delete@this.none.com) on February 2, 2013 10:43 am wrote:
> > > > > That 24 entries figure is wrong and anyway A9 has no ROB as found in other OoO CPU :)
> > > >
> > > > ARM describes the A9 as doing "out of order issue". The TRM further describes it as using register
> > > > renaming to resolve WAW/WAR hazards without stalling, which implies a Tomasulo machine or similar.
> > > > The ARM ISA requires precise exceptions. I've developed OS code including fault handlers and
> > > > context-switching for A9, so I *know* it implements precise exceptions.
> > > >
> > > > I'm not aware of a means of implementing out-of-order issue with non-stalling resolution of WAR/WAW
> > > > and precise exceptions without some structure equivalent to an ROB in function if not in name.
> > > > I'm aware of the ARM slideset that says that A9 does OoO "without a power-hungry ROB" but I suspect
> > > > that's misworded and simply means that they used a physical register file to avoid Tomasulo's
> > > > reservation stations and common results bus (just like Sandy/Ivy Bridge and many other recent
> > > > OoO microarchitectures). The ROB itself isn't particularly power-hungry.
> > > >
> > > > Can you explain how A9 achieves out-of-order issue with renaming
> > > > and precise exceptions without an ROB or equivalent?
> > >
> > > Sorry to follow-up my own post again, but...
> > >
> > > I did some digging and found multiple sources that refer to a 24-entry "data-less
> > > ROB" in A9. Unfortunately ARM has pulled down the original documents (particularly
> > > the devcon 2007 A9 architecture slides) that those sources refer to.
> >
> > The 24-entry comes from people believing that 56 physical
> > registers for 32 architectural ones means 24 entries :-)
> >
> > > That tends to reinforce what I hypothesized above: It uses a PRF instead of reservation stations and
> > > a common results bus, so the ROB only needs to track instruction order and state (speculative or not,
> > > written back to PRF or not) as opposed to instruction results Hence "data-less", just like Sandy Bridge,
> > > Bobcat, and a whole lot of other modern OoO microarchitectures :-). It's still an ROB, though.
> >
> > A very very simple one then, which is what I meant ;)
>
> So then why did you say "the A9 has no ROB"?
Because it's very simple and a ROB for me is something more complex. I probably slightly exagerated.
> There have been many microarchitectures with this scheme (PRF + dataless ROB), going back to the MIPS R10000
> and Alpha 21264, and probably before that. The convention in the literature has always been to refer to those
> structures as "reorder buffers", and perhaps clarify that they are dataless. See for example figure 5 here:
>
> http://www.ecs.umass.edu/ece/koren/ece568/papers/Pentium4.pdf
>
> I suspect that the reason for keeping the "ROB" terminology is because reorder buffers were devised
> to implement precise exceptions in the basic Tomasulo architecture. Whether they also store results
> or not is secondary to that basic function. See for example the seminal paper on the topic:
>
> http://dl.acm.org/citation.cfm?id=327125, also available for download
> here: http://lmi17.cnam.fr/~anceau/Documents/smith.pdf
>
> While ARM may have claimed that the A9 was "ROB-less" to make it look new and revolutionary, they
> merely made themselves look silly. To my knowledge (which is quire fallible - I'd love to be corrected
> on this one) you can't efficiently do precise exceptions in an OoO machine [*] without an ordered
> list of pending instructions and their status, and that is by definition a reorder buffer.
>
> So, now that we've established conclusively that A9 has an ROB, how many entries do you think it has?
>
> [*] I can conceive of some very inefficient mechanisms, but those basically
> come down to reconstructing the equivalent of an ROB after the fact...
Try to look for patents that were issued to ARM between 2007 and 2010 that talk about renaming. I can't give more information and can't even link patents :)