By: anon (anon.delete@this.anon.com), July 2, 2013 6:13 am
Room: Moderated Discussions
Etienne (etienne_lorrain.delete@this.yahoo.fr) on July 2, 2013 4:36 am wrote:
> > > > Your logic simply doesn't hold up. If they just wanted to get flops, the CPU would look like a GPU.
> > >
> > > That would costs a lot as well.
> > >
> >
> > No it doesn't. GPU can do more flops/watt than a CPU, and more flops/area. Just put a
> > little A7 core in one corner to run the OS, and dedicate the rest to a GPGPU array.
>
> Isn't the GPGPU a lot quicker mainly because it does not have to do what the CPU does, i.e. manage virtual memory
> and memory protection for every bytes (all the TLB work and delays), manage cache lines shared in between CPUs
> (copying written cache lines to other caches), manage security by erasing newly allocated pages to processes,
> manage all the crappy hardware around (active waits because some version of that chip do not allow two consecutive
> writes within N microseconds...), manage different version of libraries (page loaded on demand, position independent
> code, dynamic linking of files which can be in 10 different places in the filesystem)?
Not due to this high level stuff, because the hardware itself is more efficient.
In terms of hardware, I don't know exactly. Surely a less sophisticated virtual memory and coherency semantics should help. But Xeon Phi and BG/Q do quite well on the Green 500 list. While NVIDIA K20 is better, it is built on 28nm vs 45nm for BlueGene, for example.
> The GPGPU will not help there, and will constantly wait for the "little A7 core" to finish the stuff.
>
> > > > Your logic simply doesn't hold up. If they just wanted to get flops, the CPU would look like a GPU.
> > >
> > > That would costs a lot as well.
> > >
> >
> > No it doesn't. GPU can do more flops/watt than a CPU, and more flops/area. Just put a
> > little A7 core in one corner to run the OS, and dedicate the rest to a GPGPU array.
>
> Isn't the GPGPU a lot quicker mainly because it does not have to do what the CPU does, i.e. manage virtual memory
> and memory protection for every bytes (all the TLB work and delays), manage cache lines shared in between CPUs
> (copying written cache lines to other caches), manage security by erasing newly allocated pages to processes,
> manage all the crappy hardware around (active waits because some version of that chip do not allow two consecutive
> writes within N microseconds...), manage different version of libraries (page loaded on demand, position independent
> code, dynamic linking of files which can be in 10 different places in the filesystem)?
Not due to this high level stuff, because the hardware itself is more efficient.
In terms of hardware, I don't know exactly. Surely a less sophisticated virtual memory and coherency semantics should help. But Xeon Phi and BG/Q do quite well on the Green 500 list. While NVIDIA K20 is better, it is built on 28nm vs 45nm for BlueGene, for example.
> The GPGPU will not help there, and will constantly wait for the "little A7 core" to finish the stuff.
>