By: someone (someone.delete@this.somewhere.com), July 25, 2012 9:58 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on July 25, 2012 1:37 am wrote:
> New computational efficiency data shows GPUs with a clear edge over CPUs, but
> the gap is narrowing as CPUs adopt wide vectors (e.g. AVX). Surprisingly, a
> throughput CPU is the most energy efficient processor, offering hope for future
> architectures. Our data also shows some advantages of AMD's Bulldozer, and the
> overhead associated with highly scalable server CPUs.
>
> Comments and feedback
> welcome!
>
> David
Calling FLOPS/W and FLOPs/mm2 efficiency is highly misleading because it has no
concept of effective FLOPs while doing something useful. The FP functional units
of a general purpose MPU is a tiny fraction of device area and power budget. Why?
Everything else there SUPPORTS feeding those units over a huge spectrum of usage
in terms of data access and control complexity without demanding quite unreasonable
effort and methods for programming. GPUs have less silicon overhead per FP unit
because they are very less generally useful. For HPC algorithms with complex data
access and control paths GPUs are hugely inefficient and can only approach a tiny
fraction of their theoretical peak FLOPs. Why hasn't anyone run published a SPECfp
score running entirely on a GPU yet? :-D
In a modern process I could tile a 200 mm2 die with nothing but FMACs and clock
and power distribution and blow away everything on your graph but it would not be
capable of anything useful. But hey, what "efficiency" woot!
> New computational efficiency data shows GPUs with a clear edge over CPUs, but
> the gap is narrowing as CPUs adopt wide vectors (e.g. AVX). Surprisingly, a
> throughput CPU is the most energy efficient processor, offering hope for future
> architectures. Our data also shows some advantages of AMD's Bulldozer, and the
> overhead associated with highly scalable server CPUs.
>
> Comments and feedback
> welcome!
>
> David
Calling FLOPS/W and FLOPs/mm2 efficiency is highly misleading because it has no
concept of effective FLOPs while doing something useful. The FP functional units
of a general purpose MPU is a tiny fraction of device area and power budget. Why?
Everything else there SUPPORTS feeding those units over a huge spectrum of usage
in terms of data access and control complexity without demanding quite unreasonable
effort and methods for programming. GPUs have less silicon overhead per FP unit
because they are very less generally useful. For HPC algorithms with complex data
access and control paths GPUs are hugely inefficient and can only approach a tiny
fraction of their theoretical peak FLOPs. Why hasn't anyone run published a SPECfp
score running entirely on a GPU yet? :-D
In a modern process I could tile a 200 mm2 die with nothing but FMACs and clock
and power distribution and blow away everything on your graph but it would not be
capable of anything useful. But hey, what "efficiency" woot!



