By: David Kanter (dkanter.delete@this.realworldtech.com), February 5, 2013 7:36 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on February 5, 2013 12:38 am wrote:
> Per Hesselgren (grabb1948.delete@this.passagen.se) on February 5, 2013 12:13 am wrote:
> > Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 2, 2013 11:10 am wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> > > > David suggested posting this to the forum. I think he has a few remarks of his own to add on this topic...
> > > >
> > > > I think that the statement that x86 takes 5-15% more area than RISC is a bit simplistic,
> > > > because the penalty is highly variable depending on what performance level you're
> > > > targeting and what sort of microarchitecture you have to use to get there.
> > >
> > > x86 also has a steeper learning curve as one needs to learn the tricks to handle various odds
> > > and ends. Intel and AMD already have institutional knowledge about implementation (including
> > > validation tools), but a third party is less likely to find implementing a variant or an original
> > > design worthwhile (even if Intel provided the appropriate licensing). It has also been argued
> > > that a "necessity is the mother of invention" factor drove x86 implementers to innovate.
> > >
> > > A clean RISC like Alpha (or--from what I have read--AArch64) would be much more friendly to fast bring-up
> > > of a decent microarchitecture. (Classic ARM seems to be somewhere in the middle--not as complex as x86 but
> > > not as simple as Alpha--, but even with Thumb2+classic ARM it might be closer to Alpha than to x86.)
> > >
> > > [snip]
> > > > My own take is that for ARM-based microservers to survive they need to stay down in the "many weak cores"
> > > > regime and focus on massively parallel workloads that can tolerate the latency penalty. If they try to
> > > > move up into higher performance brackets then they'll be playing directly into Intel's hand.
> > >
> > > I agree that trying to compete with Intel x86 at the high performance end will be excessively difficult,
> > > but I think the ARM brigade may have a flexibility advantage.
> > > Even though Intel has been demonstrating some
> > > willingness to try new things and develop concurrent multiple
> > > microarchitectures, Intel seems to be too conservative
> > > to try radical designs. It is not clear that ARM will take advantage of its greater tolerance of diversity
> > > (while learning to provide a coherent interface to software)
> > > to introduce some weird and wonderful architectural
> > > features. ARM has been very quiet about transactional memory and multithreading; features along the lines
> > > of Intel's TSX and MIPS' MT-ASE could be significant in the server market.
> > >
> > > Even if ARM does not innovate much architecturally, I think the implementers may feel
> > > much more free to try different accelerators and microarchitectural tweaks. With
> > > an Architecture license, non-ARM implementers could even add new instructions.
> >
> > One of the few ARM server benchmarks
> > http://armservers.com/2012/06/18/apache-benchmarks-for-calxedas-5-watt-web-server/#more-206
> > Why just Apache?
>
> It's a well constructed workload to make the ARM server look good.
>
> The ARM server is running at 100% CPU, which is very desirable, for efficiency point of view.
>
> Then, the Xeon saturates the link at 15% CPU. This constrains its performance, but also prevents it from going
> into sleep mode. Short sleep intervals between latency critical events is a nasty workload for big cores.
>
> Operating systems are also traditionally not very good at helping power efficiency. Ideally,
> 3 cores would be shut off, and 1 core would run at ~60% CPU and stay up in shallower sleep
> states. What I am *guessing* is happening is that load is being spread over all 4 cores.
>
> Oh, and by the looks of it they're just taking the TDP values for the Intel setup and
> claiming that is the power draw. They're also using 4x the memory in the Intel system
> and measuring that, but not including amortizing factors like storage and PSU.
>
> Seems like a pretty bad test overall. I wouldn't be surprised if the ARM had a little
> edge in power efficiency under a range of workloads (not just the load points that make
> it look good). However if you properly and fairly designed a solution for a given level
> of capacity, I would be shocked if the difference at the wall was even 2x.
>
IIRC, that testing was with 1GBE. Intel redid the testing by connecting a 10GBE card, and it changed the situation substantially.
www.theregister.co.uk/2012/08/13/xeon_vs_calxeda_arm_apache_bench/
I don't have the new numbers though.
David
> Per Hesselgren (grabb1948.delete@this.passagen.se) on February 5, 2013 12:13 am wrote:
> > Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 2, 2013 11:10 am wrote:
> > > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> > > > David suggested posting this to the forum. I think he has a few remarks of his own to add on this topic...
> > > >
> > > > I think that the statement that x86 takes 5-15% more area than RISC is a bit simplistic,
> > > > because the penalty is highly variable depending on what performance level you're
> > > > targeting and what sort of microarchitecture you have to use to get there.
> > >
> > > x86 also has a steeper learning curve as one needs to learn the tricks to handle various odds
> > > and ends. Intel and AMD already have institutional knowledge about implementation (including
> > > validation tools), but a third party is less likely to find implementing a variant or an original
> > > design worthwhile (even if Intel provided the appropriate licensing). It has also been argued
> > > that a "necessity is the mother of invention" factor drove x86 implementers to innovate.
> > >
> > > A clean RISC like Alpha (or--from what I have read--AArch64) would be much more friendly to fast bring-up
> > > of a decent microarchitecture. (Classic ARM seems to be somewhere in the middle--not as complex as x86 but
> > > not as simple as Alpha--, but even with Thumb2+classic ARM it might be closer to Alpha than to x86.)
> > >
> > > [snip]
> > > > My own take is that for ARM-based microservers to survive they need to stay down in the "many weak cores"
> > > > regime and focus on massively parallel workloads that can tolerate the latency penalty. If they try to
> > > > move up into higher performance brackets then they'll be playing directly into Intel's hand.
> > >
> > > I agree that trying to compete with Intel x86 at the high performance end will be excessively difficult,
> > > but I think the ARM brigade may have a flexibility advantage.
> > > Even though Intel has been demonstrating some
> > > willingness to try new things and develop concurrent multiple
> > > microarchitectures, Intel seems to be too conservative
> > > to try radical designs. It is not clear that ARM will take advantage of its greater tolerance of diversity
> > > (while learning to provide a coherent interface to software)
> > > to introduce some weird and wonderful architectural
> > > features. ARM has been very quiet about transactional memory and multithreading; features along the lines
> > > of Intel's TSX and MIPS' MT-ASE could be significant in the server market.
> > >
> > > Even if ARM does not innovate much architecturally, I think the implementers may feel
> > > much more free to try different accelerators and microarchitectural tweaks. With
> > > an Architecture license, non-ARM implementers could even add new instructions.
> >
> > One of the few ARM server benchmarks
> > http://armservers.com/2012/06/18/apache-benchmarks-for-calxedas-5-watt-web-server/#more-206
> > Why just Apache?
>
> It's a well constructed workload to make the ARM server look good.
>
> The ARM server is running at 100% CPU, which is very desirable, for efficiency point of view.
>
> Then, the Xeon saturates the link at 15% CPU. This constrains its performance, but also prevents it from going
> into sleep mode. Short sleep intervals between latency critical events is a nasty workload for big cores.
>
> Operating systems are also traditionally not very good at helping power efficiency. Ideally,
> 3 cores would be shut off, and 1 core would run at ~60% CPU and stay up in shallower sleep
> states. What I am *guessing* is happening is that load is being spread over all 4 cores.
>
> Oh, and by the looks of it they're just taking the TDP values for the Intel setup and
> claiming that is the power draw. They're also using 4x the memory in the Intel system
> and measuring that, but not including amortizing factors like storage and PSU.
>
> Seems like a pretty bad test overall. I wouldn't be surprised if the ARM had a little
> edge in power efficiency under a range of workloads (not just the load points that make
> it look good). However if you properly and fairly designed a solution for a given level
> of capacity, I would be shocked if the difference at the wall was even 2x.
>
IIRC, that testing was with 1GBE. Intel redid the testing by connecting a 10GBE card, and it changed the situation substantially.
www.theregister.co.uk/2012/08/13/xeon_vs_calxeda_arm_apache_bench/
I don't have the new numbers though.
David