By: anon (anon.delete@this.anon.com), May 17, 2013 11:03 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on May 17, 2013 9:29 pm wrote:
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on May 17, 2013 12:22 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on May 17, 2013 8:00 am wrote:
> > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on May 15, 2013 5:37 pm wrote:
> > > > Ashraf Eassa (aeassa.delete@this.gmail.com) on May 15, 2013 11:59 am wrote:
> > > > > Hi everybody,
> > > > >
> > > > > I've been lurking for years, but the time has come when I would really love to pick the brains of
> > > > > the experts we have here. From my understanding, Atom is a much narrower design than Krait, Cortex
> > > > > A15 and others, and yet, in many benchmarks the older Saltwell core holds its own against even Krait
> > > > > in both FPU/INT, and against A15 in Linux integer benchmarks (but it gets decimated in FPU).
> > > >
> > > > Which Linux benchmarks do you mean? This does not show a single benchmark where dual Atom can keep up
> > > > with dual A15. Even Tegra 3 wins 9 out of 11 benchmarks despite its slow single-channel memory system.
> > >
> > > You realize there are an awful lot more tests out there that Phoronix doesn't run, right?
> > >
> > > Also, why do we even care about Linux? We care about Android, which is rather distinct from Linux.
> >
> > Android is just a layer on top of Linux. So yes, we do care about Linux performance using GCC on mainstream
> > code. How many tricks ICC uses to get great SPEC results is irrelevant in the Linux/Android world.
>
> Android applications don't really use GCC as I understand it. They are using Dalvik.
What is the JIT, runtime and GUI libraries, the kernel, GPU drivers, and native code applications etc (i.e., the performance critical ones) that they are running compiled with? Gcc, I would guess. Perhaps LLVM for some parts.
> So you're comparing two entirely different software stacks and trying to draw
> conclusions.
No, the tests linked do not use Android. It is close to apples, on the software side.
Of course you're going to be using Android rather than Ubuntu with an A15 smartphone. However the range of tests run there was quite reasonable IMO. Including decent branchy integer benchmarks. Although with compile benchmark I don't know if phoronix uses a cross compiler to compile to the same target when doing those benchmarks, so the compile one might not be valid.
Moreover, my understanding is that Moorestown actually matches
> the A9 and A15 on quite a few benchmarks (based on discussions with Anand).
Can you link to a few, where it beats A15?
>
> > > > So, my question is, how do I think about "Silvermont" competitive position against a fairly
> > > > > beefy modern ARM design such as the Cortex A15? From a high level perspective, it looks
> > > > > like on a per-clock basis it should be no contest - A15 is wider and more aggressive.
> > > > > But Intel is claiming that Silvermont is as fast as A15 on a per-clock basis.
> > > >
> > > > "Intel is claiming" - there is your hint... When Atom originally was announced, it was supposed
> > > > to be 5-6 times faster than ARM cores. However when Atom was finally available in phones, it
> > > > actually lagged in performance. This is where Atom is today. Is that competitive?
> > >
> > > When did Intel claim that Atom would be 5-6x faster than ARM cores? And which
> > > ARM cores? I'd like to see some proof, because that just sounds crazy.
> >
> > Here is what Intel claimed at the time. Yep crazy stuff indeed. They compared the fastest Atom against
> > a low frequency ARM11, despite much faster versions being available (IIRC 750MHz), as well as 600/800MHz
> > Cortex-A8. Remember the "only x86 gives the full web browsing experience" slogans?
>
> It was a stupid comparison, you're right. But I wouldn't say it was incorrect.
Come on! It is the biggest weasel-wording slide you can possibly create, while having some very slim chance to say "well it's *technically* correct".
They clearly presented it as performance advantage over competition. There is zero chance that Intel did not know about, or could not have procured, a contemporary ARM core which played in the high end market that Atom was supposedly aiming for.
It was dishonest and wrong.
> I'm quite
> sure that Silverthorne beats the OMAP2 silly. Of course, the OMAP2 has been irrelelvant
> for ages. Designing a car faster than a horse-drawn buggy isn't exactly impressive.
>
> And it was a very specific comparison, which I suspect is true. Intel's marketing is
> aggressive, but if you question them carefully, you'll ALWAYS get the fine print. Of
> course, most journalists are not technically savvy enough to do so. And in this case,
> the fine print said: "We are winning a comparison which is a pointless comparison".
>
> > > > > A couple of questions then:
> > > > >
> > > > > 1. How can a narrower design pull this off?
> > > >
> > > > It doesn't. Not without trickery anyway - like comparing a highly clocked CPU against a low
> > > > clocked one,
> > >
> > > That's not trickery, that's life. Intel has better process technology and is able to
> > > hit higher clock speeds.
> >
> > It's trickery when you use a slow CPU on purpose when much faster CPUs are available.
>
> Intel's comparisons vs. A15 were normalized to power and/or specific devices. That's
> a very fair equalization point. Far more sensible than equal frequency.
Which comparison was this? Based on shipping devices?
>
> > And in terms of frequency ARM has caught up dramatically in recent years. I expect ARM
> > to pull ahead in frequency with Tegra 4i, 20nm A15's and the first 64-bit ARMs.
>
> That's not remotely true. The marketing department for Samsung and Nvidia has caught up. Most of these
> devices are rated for frequencies that cannot be sustained for more than a few seconds at best, and
> still result in ridiculous power consumption. The frequency for Atom is also suspect, but frankly Intel
> lies less about frequency than other companies and has vastly better DVFS implementations.
Well, what matters is performance. So wen we see anand's numbers where A15 is beaten by Moorestown in many benchmarks, that will be most interesting. If power and uncore is somewhat equalized, then I think it can be said that A15 has poor perf/watt.
>
> > > Moreover, there are many A15 implementations that are incredibly
> > > power hungry. This shouldn't surprise anyone, since the A15 started out as a server
> > > core...but then something happened and ARM tried to shove it into mobiles.
> >
> > I call BS on that - ARM has never said that A15 is a server-only
> > CPU, it has always been designed for mobile/tablets
> > but with added server extensions. Here is Anands first article on A15, even the title is clear.
>
> I have been told by several people that A15 was internally started aimed at servers and
> later repositioned. That being said, I've been told that isn't true by other people.
>
> > > Clock-normalized comparisons are useful as thinking points, but you really need to consider physical design
> > > and process technology. Power and frequency are intrinsically tied to physical design and process, as is
> > > area. Certainly there are architectural techniques that can have a big impact (I think the A7 omitting
> > > a branch predictor is particularly brilliant in that regard), but process has a bigger influence.
> >
> > If process was the only thing that mattered then how could Calxeda server nodes
> > possibly beat Atom on both performance and power using an old 40nm process?
>
> It's not the only thing that matters, but process is generally a bigger deal than microarchitecture. More
> to the point, clock-normalized comparisons are stupid. Power normalized actually makes sense. And when
> you compare power, process REALLY matters. Especially for something like 22nm FinFETs vs. 28nm bulk.
>
> > > >comparing an unreleased CPU against a much older CPU, using different compiler
> > > > versions or optimizing for specific benchmarks (SunSpider). It's called "benchmarketing"...
> > >
> > > > About the only area where Silvermont appears to have an
> > > > advantage over A15 is a lower L2 latency. Everything
> > > > else is like you said, smaller buffers, narrower, simpler
> > > > and less aggressive. Given the memory system advantage
> > > > I'd expect it to beat A9 by a good margin (although A9R4 might well be competitive). However based on what
> > > > we know you'd have to be extremely optimistic to believe it can get even close to A15 performance.
> > >
> > > I claim BS already. If A15 is so good, why do partial register
> > > stalls cause a massive drop in performance for
> > > Neon? Oh right, maybe it's because someone made a stupid architectural decision they fixed in the A57.
> >
> > Do you have any evidence for that? Partial register stalls are rare
> > on ARM, I don't believe they happen in common cases, unlike x86.
>
> I had a long conversation with a friend (employed at a key ARM partner/customer) on this topic. When
> they turned on Neon, the performance dropped significantly, which they traced back to partial register
> accesses and the aliased register files. It's partially a result of how the NEON pipeline integrates
> with the CPU and writes results back to registers and when they merge partial register writes.
Perhaps they were using an old compiler or not scheduling instructions correctly? In floating point, from benchmarks I've seen, A15 is much better than Atom.
>
> Bottom line: ARM has made plenty of stupid design choices.
> Their designs are hardly perfect and they are learning.
>
> David
>
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on May 17, 2013 12:22 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on May 17, 2013 8:00 am wrote:
> > > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on May 15, 2013 5:37 pm wrote:
> > > > Ashraf Eassa (aeassa.delete@this.gmail.com) on May 15, 2013 11:59 am wrote:
> > > > > Hi everybody,
> > > > >
> > > > > I've been lurking for years, but the time has come when I would really love to pick the brains of
> > > > > the experts we have here. From my understanding, Atom is a much narrower design than Krait, Cortex
> > > > > A15 and others, and yet, in many benchmarks the older Saltwell core holds its own against even Krait
> > > > > in both FPU/INT, and against A15 in Linux integer benchmarks (but it gets decimated in FPU).
> > > >
> > > > Which Linux benchmarks do you mean? This does not show a single benchmark where dual Atom can keep up
> > > > with dual A15. Even Tegra 3 wins 9 out of 11 benchmarks despite its slow single-channel memory system.
> > >
> > > You realize there are an awful lot more tests out there that Phoronix doesn't run, right?
> > >
> > > Also, why do we even care about Linux? We care about Android, which is rather distinct from Linux.
> >
> > Android is just a layer on top of Linux. So yes, we do care about Linux performance using GCC on mainstream
> > code. How many tricks ICC uses to get great SPEC results is irrelevant in the Linux/Android world.
>
> Android applications don't really use GCC as I understand it. They are using Dalvik.
What is the JIT, runtime and GUI libraries, the kernel, GPU drivers, and native code applications etc (i.e., the performance critical ones) that they are running compiled with? Gcc, I would guess. Perhaps LLVM for some parts.
> So you're comparing two entirely different software stacks and trying to draw
> conclusions.
No, the tests linked do not use Android. It is close to apples, on the software side.
Of course you're going to be using Android rather than Ubuntu with an A15 smartphone. However the range of tests run there was quite reasonable IMO. Including decent branchy integer benchmarks. Although with compile benchmark I don't know if phoronix uses a cross compiler to compile to the same target when doing those benchmarks, so the compile one might not be valid.
Moreover, my understanding is that Moorestown actually matches
> the A9 and A15 on quite a few benchmarks (based on discussions with Anand).
Can you link to a few, where it beats A15?
>
> > > > So, my question is, how do I think about "Silvermont" competitive position against a fairly
> > > > > beefy modern ARM design such as the Cortex A15? From a high level perspective, it looks
> > > > > like on a per-clock basis it should be no contest - A15 is wider and more aggressive.
> > > > > But Intel is claiming that Silvermont is as fast as A15 on a per-clock basis.
> > > >
> > > > "Intel is claiming" - there is your hint... When Atom originally was announced, it was supposed
> > > > to be 5-6 times faster than ARM cores. However when Atom was finally available in phones, it
> > > > actually lagged in performance. This is where Atom is today. Is that competitive?
> > >
> > > When did Intel claim that Atom would be 5-6x faster than ARM cores? And which
> > > ARM cores? I'd like to see some proof, because that just sounds crazy.
> >
> > Here is what Intel claimed at the time. Yep crazy stuff indeed. They compared the fastest Atom against
> > a low frequency ARM11, despite much faster versions being available (IIRC 750MHz), as well as 600/800MHz
> > Cortex-A8. Remember the "only x86 gives the full web browsing experience" slogans?
>
> It was a stupid comparison, you're right. But I wouldn't say it was incorrect.
Come on! It is the biggest weasel-wording slide you can possibly create, while having some very slim chance to say "well it's *technically* correct".
They clearly presented it as performance advantage over competition. There is zero chance that Intel did not know about, or could not have procured, a contemporary ARM core which played in the high end market that Atom was supposedly aiming for.
It was dishonest and wrong.
> I'm quite
> sure that Silverthorne beats the OMAP2 silly. Of course, the OMAP2 has been irrelelvant
> for ages. Designing a car faster than a horse-drawn buggy isn't exactly impressive.
>
> And it was a very specific comparison, which I suspect is true. Intel's marketing is
> aggressive, but if you question them carefully, you'll ALWAYS get the fine print. Of
> course, most journalists are not technically savvy enough to do so. And in this case,
> the fine print said: "We are winning a comparison which is a pointless comparison".
>
> > > > > A couple of questions then:
> > > > >
> > > > > 1. How can a narrower design pull this off?
> > > >
> > > > It doesn't. Not without trickery anyway - like comparing a highly clocked CPU against a low
> > > > clocked one,
> > >
> > > That's not trickery, that's life. Intel has better process technology and is able to
> > > hit higher clock speeds.
> >
> > It's trickery when you use a slow CPU on purpose when much faster CPUs are available.
>
> Intel's comparisons vs. A15 were normalized to power and/or specific devices. That's
> a very fair equalization point. Far more sensible than equal frequency.
Which comparison was this? Based on shipping devices?
>
> > And in terms of frequency ARM has caught up dramatically in recent years. I expect ARM
> > to pull ahead in frequency with Tegra 4i, 20nm A15's and the first 64-bit ARMs.
>
> That's not remotely true. The marketing department for Samsung and Nvidia has caught up. Most of these
> devices are rated for frequencies that cannot be sustained for more than a few seconds at best, and
> still result in ridiculous power consumption. The frequency for Atom is also suspect, but frankly Intel
> lies less about frequency than other companies and has vastly better DVFS implementations.
Well, what matters is performance. So wen we see anand's numbers where A15 is beaten by Moorestown in many benchmarks, that will be most interesting. If power and uncore is somewhat equalized, then I think it can be said that A15 has poor perf/watt.
>
> > > Moreover, there are many A15 implementations that are incredibly
> > > power hungry. This shouldn't surprise anyone, since the A15 started out as a server
> > > core...but then something happened and ARM tried to shove it into mobiles.
> >
> > I call BS on that - ARM has never said that A15 is a server-only
> > CPU, it has always been designed for mobile/tablets
> > but with added server extensions. Here is Anands first article on A15, even the title is clear.
>
> I have been told by several people that A15 was internally started aimed at servers and
> later repositioned. That being said, I've been told that isn't true by other people.
>
> > > Clock-normalized comparisons are useful as thinking points, but you really need to consider physical design
> > > and process technology. Power and frequency are intrinsically tied to physical design and process, as is
> > > area. Certainly there are architectural techniques that can have a big impact (I think the A7 omitting
> > > a branch predictor is particularly brilliant in that regard), but process has a bigger influence.
> >
> > If process was the only thing that mattered then how could Calxeda server nodes
> > possibly beat Atom on both performance and power using an old 40nm process?
>
> It's not the only thing that matters, but process is generally a bigger deal than microarchitecture. More
> to the point, clock-normalized comparisons are stupid. Power normalized actually makes sense. And when
> you compare power, process REALLY matters. Especially for something like 22nm FinFETs vs. 28nm bulk.
>
> > > >comparing an unreleased CPU against a much older CPU, using different compiler
> > > > versions or optimizing for specific benchmarks (SunSpider). It's called "benchmarketing"...
> > >
> > > > About the only area where Silvermont appears to have an
> > > > advantage over A15 is a lower L2 latency. Everything
> > > > else is like you said, smaller buffers, narrower, simpler
> > > > and less aggressive. Given the memory system advantage
> > > > I'd expect it to beat A9 by a good margin (although A9R4 might well be competitive). However based on what
> > > > we know you'd have to be extremely optimistic to believe it can get even close to A15 performance.
> > >
> > > I claim BS already. If A15 is so good, why do partial register
> > > stalls cause a massive drop in performance for
> > > Neon? Oh right, maybe it's because someone made a stupid architectural decision they fixed in the A57.
> >
> > Do you have any evidence for that? Partial register stalls are rare
> > on ARM, I don't believe they happen in common cases, unlike x86.
>
> I had a long conversation with a friend (employed at a key ARM partner/customer) on this topic. When
> they turned on Neon, the performance dropped significantly, which they traced back to partial register
> accesses and the aliased register files. It's partially a result of how the NEON pipeline integrates
> with the CPU and writes results back to registers and when they merge partial register writes.
Perhaps they were using an old compiler or not scheduling instructions correctly? In floating point, from benchmarks I've seen, A15 is much better than Atom.
>
> Bottom line: ARM has made plenty of stupid design choices.
> Their designs are hardly perfect and they are learning.
>
> David
>