By: Patrick Chase (patrickjchase.delete@this.gmail.com), July 2, 2013 7:34 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on June 30, 2013 2:53 am wrote:
> EduardoS (no.delete@this.spam.com) on June 29, 2013 9:31 pm wrote:
> > Ask chip designers, looking from distance designing a FPU that doesn't sucks
> > very badly doesn't look as hard as designing a scheduller that performs exceptioanlly well.
>
> Your logic simply doesn't hold up. If they just wanted to get flops, the CPU would look like a GPU.
You're confusing "FP performance" with "latency machine vs throughput machine".
A CPU like Silvermont is mostly a latency machine - It's designed to minimize wall-clock runtime for 1 or 2 threads. A GPU is a throughput machine - It's designed to maximize overall throughput for a very large number of near-identical threads/work-items, with no regard to wall-clock runtime for individual work items. The fact that a GPU is a throughput machine is why it can dispense with latency-minimizing structures like large caches and fancy OoO schedulers, and that leads to a very high FLOPs/area ratio.
The fact that GPUs happen to FP-oriented therefore doesn't mean that they're automatically the best way to "get FLOPs". The workload also has to map well onto the throughput machine idiom. Many don't, and that's why CPUs with blazing-fast FP continue to be designed and sold.
-- Patrick
> EduardoS (no.delete@this.spam.com) on June 29, 2013 9:31 pm wrote:
> > Ask chip designers, looking from distance designing a FPU that doesn't sucks
> > very badly doesn't look as hard as designing a scheduller that performs exceptioanlly well.
>
> Your logic simply doesn't hold up. If they just wanted to get flops, the CPU would look like a GPU.
You're confusing "FP performance" with "latency machine vs throughput machine".
A CPU like Silvermont is mostly a latency machine - It's designed to minimize wall-clock runtime for 1 or 2 threads. A GPU is a throughput machine - It's designed to maximize overall throughput for a very large number of near-identical threads/work-items, with no regard to wall-clock runtime for individual work items. The fact that a GPU is a throughput machine is why it can dispense with latency-minimizing structures like large caches and fancy OoO schedulers, and that leads to a very high FLOPs/area ratio.
The fact that GPUs happen to FP-oriented therefore doesn't mean that they're automatically the best way to "get FLOPs". The workload also has to map well onto the throughput machine idiom. Many don't, and that's why CPUs with blazing-fast FP continue to be designed and sold.
-- Patrick