By: anon (anon.delete@this.anon.com), February 5, 2013 12:38 am
Room: Moderated Discussions
Per Hesselgren (grabb1948.delete@this.passagen.se) on February 5, 2013 12:13 am wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 2, 2013 11:10 am wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> > > David suggested posting this to the forum. I think he has a few remarks of his own to add on this topic...
> > >
> > > I think that the statement that x86 takes 5-15% more area than RISC is a bit simplistic,
> > > because the penalty is highly variable depending on what performance level you're
> > > targeting and what sort of microarchitecture you have to use to get there.
> >
> > x86 also has a steeper learning curve as one needs to learn the tricks to handle various odds
> > and ends. Intel and AMD already have institutional knowledge about implementation (including
> > validation tools), but a third party is less likely to find implementing a variant or an original
> > design worthwhile (even if Intel provided the appropriate licensing). It has also been argued
> > that a "necessity is the mother of invention" factor drove x86 implementers to innovate.
> >
> > A clean RISC like Alpha (or--from what I have read--AArch64) would be much more friendly to fast bring-up
> > of a decent microarchitecture. (Classic ARM seems to be somewhere in the middle--not as complex as x86 but
> > not as simple as Alpha--, but even with Thumb2+classic ARM it might be closer to Alpha than to x86.)
> >
> > [snip]
> > > My own take is that for ARM-based microservers to survive they need to stay down in the "many weak cores"
> > > regime and focus on massively parallel workloads that can tolerate the latency penalty. If they try to
> > > move up into higher performance brackets then they'll be playing directly into Intel's hand.
> >
> > I agree that trying to compete with Intel x86 at the high performance end will be excessively difficult,
> > but I think the ARM brigade may have a flexibility advantage.
> > Even though Intel has been demonstrating some
> > willingness to try new things and develop concurrent multiple
> > microarchitectures, Intel seems to be too conservative
> > to try radical designs. It is not clear that ARM will take advantage of its greater tolerance of diversity
> > (while learning to provide a coherent interface to software)
> > to introduce some weird and wonderful architectural
> > features. ARM has been very quiet about transactional memory and multithreading; features along the lines
> > of Intel's TSX and MIPS' MT-ASE could be significant in the server market.
> >
> > Even if ARM does not innovate much architecturally, I think the implementers may feel
> > much more free to try different accelerators and microarchitectural tweaks. With
> > an Architecture license, non-ARM implementers could even add new instructions.
>
> One of the few ARM server benchmarks
> http://armservers.com/2012/06/18/apache-benchmarks-for-calxedas-5-watt-web-server/#more-206
> Why just Apache?
It's a well constructed workload to make the ARM server look good.
The ARM server is running at 100% CPU, which is very desirable, for efficiency point of view.
Then, the Xeon saturates the link at 15% CPU. This constrains its performance, but also prevents it from going into sleep mode. Short sleep intervals between latency critical events is a nasty workload for big cores.
Operating systems are also traditionally not very good at helping power efficiency. Ideally, 3 cores would be shut off, and 1 core would run at ~60% CPU and stay up in shallower sleep states. What I am *guessing* is happening is that load is being spread over all 4 cores.
Oh, and by the looks of it they're just taking the TDP values for the Intel setup and claiming that is the power draw. They're also using 4x the memory in the Intel system and measuring that, but not including amortizing factors like storage and PSU.
Seems like a pretty bad test overall. I wouldn't be surprised if the ARM had a little edge in power efficiency under a range of workloads (not just the load points that make it look good). However if you properly and fairly designed a solution for a given level of capacity, I would be shocked if the difference at the wall was even 2x.
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on February 2, 2013 11:10 am wrote:
> > Patrick Chase (patrickjchase.delete@this.gmail.com) on February 1, 2013 10:11 pm wrote:
> > > David suggested posting this to the forum. I think he has a few remarks of his own to add on this topic...
> > >
> > > I think that the statement that x86 takes 5-15% more area than RISC is a bit simplistic,
> > > because the penalty is highly variable depending on what performance level you're
> > > targeting and what sort of microarchitecture you have to use to get there.
> >
> > x86 also has a steeper learning curve as one needs to learn the tricks to handle various odds
> > and ends. Intel and AMD already have institutional knowledge about implementation (including
> > validation tools), but a third party is less likely to find implementing a variant or an original
> > design worthwhile (even if Intel provided the appropriate licensing). It has also been argued
> > that a "necessity is the mother of invention" factor drove x86 implementers to innovate.
> >
> > A clean RISC like Alpha (or--from what I have read--AArch64) would be much more friendly to fast bring-up
> > of a decent microarchitecture. (Classic ARM seems to be somewhere in the middle--not as complex as x86 but
> > not as simple as Alpha--, but even with Thumb2+classic ARM it might be closer to Alpha than to x86.)
> >
> > [snip]
> > > My own take is that for ARM-based microservers to survive they need to stay down in the "many weak cores"
> > > regime and focus on massively parallel workloads that can tolerate the latency penalty. If they try to
> > > move up into higher performance brackets then they'll be playing directly into Intel's hand.
> >
> > I agree that trying to compete with Intel x86 at the high performance end will be excessively difficult,
> > but I think the ARM brigade may have a flexibility advantage.
> > Even though Intel has been demonstrating some
> > willingness to try new things and develop concurrent multiple
> > microarchitectures, Intel seems to be too conservative
> > to try radical designs. It is not clear that ARM will take advantage of its greater tolerance of diversity
> > (while learning to provide a coherent interface to software)
> > to introduce some weird and wonderful architectural
> > features. ARM has been very quiet about transactional memory and multithreading; features along the lines
> > of Intel's TSX and MIPS' MT-ASE could be significant in the server market.
> >
> > Even if ARM does not innovate much architecturally, I think the implementers may feel
> > much more free to try different accelerators and microarchitectural tweaks. With
> > an Architecture license, non-ARM implementers could even add new instructions.
>
> One of the few ARM server benchmarks
> http://armservers.com/2012/06/18/apache-benchmarks-for-calxedas-5-watt-web-server/#more-206
> Why just Apache?
It's a well constructed workload to make the ARM server look good.
The ARM server is running at 100% CPU, which is very desirable, for efficiency point of view.
Then, the Xeon saturates the link at 15% CPU. This constrains its performance, but also prevents it from going into sleep mode. Short sleep intervals between latency critical events is a nasty workload for big cores.
Operating systems are also traditionally not very good at helping power efficiency. Ideally, 3 cores would be shut off, and 1 core would run at ~60% CPU and stay up in shallower sleep states. What I am *guessing* is happening is that load is being spread over all 4 cores.
Oh, and by the looks of it they're just taking the TDP values for the Intel setup and claiming that is the power draw. They're also using 4x the memory in the Intel system and measuring that, but not including amortizing factors like storage and PSU.
Seems like a pretty bad test overall. I wouldn't be surprised if the ARM had a little edge in power efficiency under a range of workloads (not just the load points that make it look good). However if you properly and fairly designed a solution for a given level of capacity, I would be shocked if the difference at the wall was even 2x.