By: someone (someone.delete@this.somewhere.com), July 10, 2015 5:43 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 8, 2015 8:23 pm wrote:
> Sylvain Collange (sylvain.collange.delete.delete@this.this.gmail.com) on July 8, 2015 10:32 am wrote:
> > Maynard Handley (name99.delete@this.name99.org) on July 8, 2015 9:46 am wrote:
> > > BTW, seeing Andre Seznec's name there, does any commercial
> > > processor yet implement a PPM or TAGE-like predictor yet?
> >
> > I am not aware of any official statement about a commercial TAGE implementation.
> >
> > But comparing Haswell's performance counters with the output of a TAGE simulator, we observe
> > comparable branch misprediction rates on average. (http://hal.inria.fr/hal-01100647/)
> >
>
> Inspired by this paper, I looked at the Geekbench3 Lua single core results (which
> I assume are basically an interpreter, and thus as good a proxy as one can hope
> for in figuring out this stuff). The results are very interesting.
> A8 gets 1787/1.4 =1270 (score/frequency)
> Sandy Bridge gets 4269/3.4=1255
> Haswell gets 4325/3.3=1310
> Nehalem gets 2284/3.2=713
> (64-bit for everything except the Nehalem result where I could not find a 64-bit Window result.
> For some strange reason there are also no Broadwell 64-bit results yet; out of interest the
> 32-bit result is 2693/2.3=1.17, which perhaps we can take as indicating a 10% penalty for
> 32-bit mode, giving us some feel for what a 64-bit Nehalem result might be.)
>
> It's merely a hint, not a proof, but it suggests that the intuition is correct (that is it gives the
> expected big jump in performance for an interpreter from Nehalem to Sandy Bridge). It also suggests
> that whatever Apple is using for their branch predictor it's pretty impressive. Perhaps not at the
> TAGE-ITTAGE level of Haswell (especially when we consider that they can make up for a few more branch
> mispredictions when they can run wider) but a very credible effort, at around the quality of the Sandy
> Bridge predictors; and presumably headed for something TAGE-like in the near future.
>
> This also suggests (pace Linus' comment that for many purposes the only SPEC result worth paying
> attention to is the gcc score) that Lua maybe should play that same role for Geekbench?
Keep in mind that the x86 processors you are comparing to A8 in clock normalized performance
are running at nearly 2.5x the frequency. That significantly inflates the apparent architectural
performance of A8 relative to the x86 processors listed.
> Sylvain Collange (sylvain.collange.delete.delete@this.this.gmail.com) on July 8, 2015 10:32 am wrote:
> > Maynard Handley (name99.delete@this.name99.org) on July 8, 2015 9:46 am wrote:
> > > BTW, seeing Andre Seznec's name there, does any commercial
> > > processor yet implement a PPM or TAGE-like predictor yet?
> >
> > I am not aware of any official statement about a commercial TAGE implementation.
> >
> > But comparing Haswell's performance counters with the output of a TAGE simulator, we observe
> > comparable branch misprediction rates on average. (http://hal.inria.fr/hal-01100647/)
> >
>
> Inspired by this paper, I looked at the Geekbench3 Lua single core results (which
> I assume are basically an interpreter, and thus as good a proxy as one can hope
> for in figuring out this stuff). The results are very interesting.
> A8 gets 1787/1.4 =1270 (score/frequency)
> Sandy Bridge gets 4269/3.4=1255
> Haswell gets 4325/3.3=1310
> Nehalem gets 2284/3.2=713
> (64-bit for everything except the Nehalem result where I could not find a 64-bit Window result.
> For some strange reason there are also no Broadwell 64-bit results yet; out of interest the
> 32-bit result is 2693/2.3=1.17, which perhaps we can take as indicating a 10% penalty for
> 32-bit mode, giving us some feel for what a 64-bit Nehalem result might be.)
>
> It's merely a hint, not a proof, but it suggests that the intuition is correct (that is it gives the
> expected big jump in performance for an interpreter from Nehalem to Sandy Bridge). It also suggests
> that whatever Apple is using for their branch predictor it's pretty impressive. Perhaps not at the
> TAGE-ITTAGE level of Haswell (especially when we consider that they can make up for a few more branch
> mispredictions when they can run wider) but a very credible effort, at around the quality of the Sandy
> Bridge predictors; and presumably headed for something TAGE-like in the near future.
>
> This also suggests (pace Linus' comment that for many purposes the only SPEC result worth paying
> attention to is the gcc score) that Lua maybe should play that same role for Geekbench?
Keep in mind that the x86 processors you are comparing to A8 in clock normalized performance
are running at nearly 2.5x the frequency. That significantly inflates the apparent architectural
performance of A8 relative to the x86 processors listed.