By: anon (anon.delete@this.anon.com), July 1, 2013 12:32 am
Room: Moderated Discussions
EduardoS (no.delete@this.spam.com) on June 30, 2013 1:26 pm wrote:
> anon (anon.delete@this.anon.com) on June 30, 2013 10:41 am wrote:
> > This is your only justification for your assertion that multiple producers in some
> > highly competitive markets are spending effort on useless product changes?
>
> If you ignore half of my post...
None of your post provided any other real evidence or logic.
>
> > How easy do you think it is to double floating point performance?
>
> Depends, what's the starting point? Doubling SIMD width or pipelining the FPU is pretty
> easy, if your FPU is small compared to the rest of the core, it is also cheap.
Intel doubled SIMD width in SandyBridge and had to redesign the pipeline to be PRF-based.
So you think going to a PRF was pretty easy change?
>
> > You can't on one
> > hand say "that costs a lot" to an alternative that does not fit your narrative, and
> > on the other hand say "doubling floating point performance is easy and cheap".
>
> I didn't say that, anyway.
Words to that effect. I was paraphrasing.
>
> > If it was so easy and cheap, and so important for useless benchmarks, then why would
> > not Intel have added huge floating point performance to Silvermont? Answer that.
>
> Ask Intel, Silvermont is a rather small core, a FPU that is small on Haswell may be quite big on Silvermont.
No, I'm asking you.
>
> > It costs depending on how much you put on, and how you measure the costs exactly.
>
> Since a small L4 is useless even for syntethic benchmarks,
> I think everybody understood you were sugesting a big L4.
No, you can have a memory latency benchmark which fits in L4. Or a particular workload which just manages to fit. On a smartphone, you probably don't have a lot of apps where the working set is hundreds of MB.
>
> > Implementing floating point performance that you find in A15, for example, is not "free or very cheap".
>
> Well, do you have a die shot?
No, why should I? Do you have any die shots for your assertion that it is free or very cheap?
>
> > No it doesn't. GPU can do more flops/watt than a CPU, and more flops/area. Just put a
> > little A7 core in one corner to run the OS, and dedicate the rest to a GPGPU array.
>
> And then your core will costs a lot more than a simple A7 and still have the same
> horrible integer performance, that's exactly what Linus were arguing against.
But you said that designers and consumers prefer to pay for things which are not relevant to their workloads. If high FLOPS is one of those things, moving to a GPU-like core would be a cheap way to win useless benchmarks.
> anon (anon.delete@this.anon.com) on June 30, 2013 10:41 am wrote:
> > This is your only justification for your assertion that multiple producers in some
> > highly competitive markets are spending effort on useless product changes?
>
> If you ignore half of my post...
None of your post provided any other real evidence or logic.
>
> > How easy do you think it is to double floating point performance?
>
> Depends, what's the starting point? Doubling SIMD width or pipelining the FPU is pretty
> easy, if your FPU is small compared to the rest of the core, it is also cheap.
Intel doubled SIMD width in SandyBridge and had to redesign the pipeline to be PRF-based.
So you think going to a PRF was pretty easy change?
>
> > You can't on one
> > hand say "that costs a lot" to an alternative that does not fit your narrative, and
> > on the other hand say "doubling floating point performance is easy and cheap".
>
> I didn't say that, anyway.
Words to that effect. I was paraphrasing.
>
> > If it was so easy and cheap, and so important for useless benchmarks, then why would
> > not Intel have added huge floating point performance to Silvermont? Answer that.
>
> Ask Intel, Silvermont is a rather small core, a FPU that is small on Haswell may be quite big on Silvermont.
No, I'm asking you.
>
> > It costs depending on how much you put on, and how you measure the costs exactly.
>
> Since a small L4 is useless even for syntethic benchmarks,
> I think everybody understood you were sugesting a big L4.
No, you can have a memory latency benchmark which fits in L4. Or a particular workload which just manages to fit. On a smartphone, you probably don't have a lot of apps where the working set is hundreds of MB.
>
> > Implementing floating point performance that you find in A15, for example, is not "free or very cheap".
>
> Well, do you have a die shot?
No, why should I? Do you have any die shots for your assertion that it is free or very cheap?
>
> > No it doesn't. GPU can do more flops/watt than a CPU, and more flops/area. Just put a
> > little A7 core in one corner to run the OS, and dedicate the rest to a GPGPU array.
>
> And then your core will costs a lot more than a simple A7 and still have the same
> horrible integer performance, that's exactly what Linus were arguing against.
But you said that designers and consumers prefer to pay for things which are not relevant to their workloads. If high FLOPS is one of those things, moving to a GPU-like core would be a cheap way to win useless benchmarks.