By: Exophase (exophase.delete@this.gmail.com), May 17, 2013 10:33 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on May 17, 2013 9:38 pm wrote:
> Intel's comparisons were peak to peak and then also normalized to 1.5W total power for the CPU blocks.
I meant properly documented comparisons from neutral third parties, not Intel's usual marketing graphs where they don't even say what they're comparing against.
>
> > Second, do you have any support for this claim that A15
> > was designed for servers and repurposed for mobiles?
>
> It's something I've heard from half a dozen people. ALthough I've also heard contrary evidence (recently).
The whole idea of ARM making a server core at all makes little sense to me. How are they supposed to make that work, by charging hundreds of times higher license fees vs their next more expensive cores? Other parties can leverage their own custom cores with other custom hardware and support that justify the many thousands of dollars of they need to charge per-system to get support in such a low volume market. ARM can't with core IP alone.
Plus A57 + A53 looks like an evolution of the same design. Maybe if they at least had some other core that fit as well for higher end phones and tablets (ie, stronger than A9), but they don't.. so you're saying that they sacrificed that to chase server wins.
> It's quite possible that the design point was tablets+servers rather than phones.
You see tablets and servers as being the common group?
If there was anything past tablets I'd lean towards notebooks (like the Chromebook we got), maybe STBs/OUYA like boxes/TV sticks, all in ones, etc. Still, consumer devices. What about A15 makes you think server?
> How tightly intermingled are your 64b and 128b accesses? It's also possible the problem is with writes
> to one of the 4 data elements within a neon register. I didn't exactly get code samples here.
>
The main advantage of utilizing the aliasing is for narrowing data sizes into both halves of a quad register then working on that register. I have plenty of code that does that and I haven't noticed any obvious performance problems. I also haven't looked at it closely, but I'd have at least noticed if it was anything less than much faster than the C code it replaced.
Writing to single elements in a NEON register is not something I really do because A8 (and presumably A9) has some performance issues with that (although as far as I can remember it was more so with 8 and 16-bit elements than 32-bit). And it's usually pretty easy to avoid. When transferring scalar registers I usually either do the dual register 64-bit write or a broadcast.
I do Yes I am. My point is that ARM's microarchitectures are not perfect, something Wilco doesn't seem to comprehend.
> It's quite damning that almost every single ARM customer has chosen to avoid the A15 for phones.
>
I agree Wilco gives ARM too much credit but two wrongs don't make a right :P
A15 has, so far, only been out in two devices. Those two devices are by Samsung who has historically not made their SoCs available to a lot of third party phone OEMs. Of the two one hasn't been in phones and the other hasn't. It's way too early to claim that it's a bust for phones based on this. It'd be like saying that Cortex-A9 was doomed to not make it in phones because for the first several months, when Tegra 2 was the only A9 SoC showing up in devices, and it showed up in tablets (and a netbook) for quite a while before it showed up in phones.
I don't think it'll be a smash or anything and I do see some argument that it's not a stellar choice, but it's not like the Exynos 5 S4s are a disaster by any means. And I expect we'll see 20nm A15s before we see 20nm A57s.
> We already know Intel is quite capable of producing rotten CPU designs (see Atom), but oddly
> enough those still seem to have pretty good performance relative to something like the A9.
I also think ye old Bonnell/Saltwell gets too much flak, although it did have a few bizarre problems. The multithreaded performance in particular is pretty nice all things considered. I can't imagine Intel really had no valid reasoning behind the more fundamental compromises. Bobcat seems to have gotten more praise but IMO, all things considered, Atom was more on the mark.
> Intel's comparisons were peak to peak and then also normalized to 1.5W total power for the CPU blocks.
I meant properly documented comparisons from neutral third parties, not Intel's usual marketing graphs where they don't even say what they're comparing against.
>
> > Second, do you have any support for this claim that A15
> > was designed for servers and repurposed for mobiles?
>
> It's something I've heard from half a dozen people. ALthough I've also heard contrary evidence (recently).
The whole idea of ARM making a server core at all makes little sense to me. How are they supposed to make that work, by charging hundreds of times higher license fees vs their next more expensive cores? Other parties can leverage their own custom cores with other custom hardware and support that justify the many thousands of dollars of they need to charge per-system to get support in such a low volume market. ARM can't with core IP alone.
Plus A57 + A53 looks like an evolution of the same design. Maybe if they at least had some other core that fit as well for higher end phones and tablets (ie, stronger than A9), but they don't.. so you're saying that they sacrificed that to chase server wins.
> It's quite possible that the design point was tablets+servers rather than phones.
You see tablets and servers as being the common group?
If there was anything past tablets I'd lean towards notebooks (like the Chromebook we got), maybe STBs/OUYA like boxes/TV sticks, all in ones, etc. Still, consumer devices. What about A15 makes you think server?
> How tightly intermingled are your 64b and 128b accesses? It's also possible the problem is with writes
> to one of the 4 data elements within a neon register. I didn't exactly get code samples here.
>
The main advantage of utilizing the aliasing is for narrowing data sizes into both halves of a quad register then working on that register. I have plenty of code that does that and I haven't noticed any obvious performance problems. I also haven't looked at it closely, but I'd have at least noticed if it was anything less than much faster than the C code it replaced.
Writing to single elements in a NEON register is not something I really do because A8 (and presumably A9) has some performance issues with that (although as far as I can remember it was more so with 8 and 16-bit elements than 32-bit). And it's usually pretty easy to avoid. When transferring scalar registers I usually either do the dual register 64-bit write or a broadcast.
I do Yes I am. My point is that ARM's microarchitectures are not perfect, something Wilco doesn't seem to comprehend.
> It's quite damning that almost every single ARM customer has chosen to avoid the A15 for phones.
>
I agree Wilco gives ARM too much credit but two wrongs don't make a right :P
A15 has, so far, only been out in two devices. Those two devices are by Samsung who has historically not made their SoCs available to a lot of third party phone OEMs. Of the two one hasn't been in phones and the other hasn't. It's way too early to claim that it's a bust for phones based on this. It'd be like saying that Cortex-A9 was doomed to not make it in phones because for the first several months, when Tegra 2 was the only A9 SoC showing up in devices, and it showed up in tablets (and a netbook) for quite a while before it showed up in phones.
I don't think it'll be a smash or anything and I do see some argument that it's not a stellar choice, but it's not like the Exynos 5 S4s are a disaster by any means. And I expect we'll see 20nm A15s before we see 20nm A57s.
> We already know Intel is quite capable of producing rotten CPU designs (see Atom), but oddly
> enough those still seem to have pretty good performance relative to something like the A9.
I also think ye old Bonnell/Saltwell gets too much flak, although it did have a few bizarre problems. The multithreaded performance in particular is pretty nice all things considered. I can't imagine Intel really had no valid reasoning behind the more fundamental compromises. Bobcat seems to have gotten more praise but IMO, all things considered, Atom was more on the mark.