By: Patrick Chase (patrickjchase.delete@this.gmail.com), July 2, 2013 4:43 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on July 2, 2013 4:12 pm wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on July 2, 2013 10:03 am wrote:
> > anon (anon.delete@this.anon.com) on July 2, 2013 7:13 am wrote in reference to GPUs:
> > > Not due to this high level stuff, because the hardware itself is more efficient.
> >
> > No, it is not. It's simply optimized to do different things.
>
> I'm talking about raw ability to do floating point operations.
Ad absurdium indeed :-)
> > A CPU is much
> > more efficient than a GPU on many "irregular" and/or iterative workloads.
> >
> > > In terms of hardware, I don't know exactly.
> >
> > Wow, that doesn't seem to prevent you from having strong opinions on the topic.
>
> The numbers I use a data from the green 500 list. I have the opinion that GPUs and
> vector oriented architectures are more efficient than short-SIMD GP CPUs, for this
> workload.
Green500 uses Linpack. That's about as GPU-friendly as it gets. It's arguably not even a representative workload for supercomputing.
> > You and Etienne are both off base, but Etienne is at least on the right path. If you're
> > actually interested in learning then take a look through this presentation:
> >
> > http://s08.idav.ucdavis.edu/fatahalian-gpu-architecture.pdf
> >
> > It's fairly dated (i.e. the number are hilariously outdated
> > some cases) but the concepts are presented correctly.
>
> This says nothing about whether GPU design will be more efficient than CPU design.
It actually says quite a lot about that, for anybody with a basic understanding of microarchitecture.
What is says is that GPUs are optimized for and extremely efficient at tasks with a very large number of independent, mostly-similar (low code divergence) work items. Linpack is one such workload, which is why the GPUs do so well on Green500. GPUs are typically more efficient than CPUs at such workloads, all else being equal (though the difference isn't as big as many people think - if you see somebody claim a "30-to-1 speedup!" from GPU then that often but not always means that their reference CPU implementation sucked).
The closest analogy I can think of is the difference between a sports car and a Greyhound bus. The sports car will get one or two people from point A to point B in the smallest amount of time possible. The bus moves a whole lot of people from point A to point B in a much larger amount of time, and also uses more fuel per trip. The catch is that the bus is fast enough and fuel-efficient enough that it takes less time and fuel per person moved than does the sports car.
So you tell me: Which is "faster" or "more efficient" at moving people? That's obviously an unanswerable question unless we know how many people we want to move and whether we care about how long it takes for the very first ones to arrive.
In case it isn't obvious, passengers are independent work items in the analogy above. The sports car is a CPU and the bus is a GPU. Linpack (the Green500 workload) basically corresponds to moving a very large number of people a short distance, and with no constraints on when the first people arrive (i.e. it's OK if everybody gets there at the same time). Unsurprisingly, the bus wins in that specific case.
> Patrick Chase (patrickjchase.delete@this.gmail.com) on July 2, 2013 10:03 am wrote:
> > anon (anon.delete@this.anon.com) on July 2, 2013 7:13 am wrote in reference to GPUs:
> > > Not due to this high level stuff, because the hardware itself is more efficient.
> >
> > No, it is not. It's simply optimized to do different things.
>
> I'm talking about raw ability to do floating point operations.
Ad absurdium indeed :-)
> > A CPU is much
> > more efficient than a GPU on many "irregular" and/or iterative workloads.
> >
> > > In terms of hardware, I don't know exactly.
> >
> > Wow, that doesn't seem to prevent you from having strong opinions on the topic.
>
> The numbers I use a data from the green 500 list. I have the opinion that GPUs and
> vector oriented architectures are more efficient than short-SIMD GP CPUs, for this
> workload.
Green500 uses Linpack. That's about as GPU-friendly as it gets. It's arguably not even a representative workload for supercomputing.
> > You and Etienne are both off base, but Etienne is at least on the right path. If you're
> > actually interested in learning then take a look through this presentation:
> >
> > http://s08.idav.ucdavis.edu/fatahalian-gpu-architecture.pdf
> >
> > It's fairly dated (i.e. the number are hilariously outdated
> > some cases) but the concepts are presented correctly.
>
> This says nothing about whether GPU design will be more efficient than CPU design.
It actually says quite a lot about that, for anybody with a basic understanding of microarchitecture.
What is says is that GPUs are optimized for and extremely efficient at tasks with a very large number of independent, mostly-similar (low code divergence) work items. Linpack is one such workload, which is why the GPUs do so well on Green500. GPUs are typically more efficient than CPUs at such workloads, all else being equal (though the difference isn't as big as many people think - if you see somebody claim a "30-to-1 speedup!" from GPU then that often but not always means that their reference CPU implementation sucked).
The closest analogy I can think of is the difference between a sports car and a Greyhound bus. The sports car will get one or two people from point A to point B in the smallest amount of time possible. The bus moves a whole lot of people from point A to point B in a much larger amount of time, and also uses more fuel per trip. The catch is that the bus is fast enough and fuel-efficient enough that it takes less time and fuel per person moved than does the sports car.
So you tell me: Which is "faster" or "more efficient" at moving people? That's obviously an unanswerable question unless we know how many people we want to move and whether we care about how long it takes for the very first ones to arrive.
In case it isn't obvious, passengers are independent work items in the analogy above. The sports car is a CPU and the bus is a GPU. Linpack (the Green500 workload) basically corresponds to moving a very large number of people a short distance, and with no constraints on when the first people arrive (i.e. it's OK if everybody gets there at the same time). Unsurprisingly, the bus wins in that specific case.