Article: Parallelism at HotPar 2010
By: Richard Cownie (tich.delete@this.pobox.com), August 4, 2010 10:08 am
Room: Moderated Discussions
Ants Aasma (ants.aasma@eesti.ee) on 8/4/10 wrote:
---------------------------
>The GPU is mostly optimized for multithreading. Most of the
>memory is in the humongous register set(s). The 5870 has 5MiB
>of registers. Each core can fetch 12 64byte operands per
>clock (16way SIMD) and there are 20 cores. So the bandwidth
>from the register set is about 13TB/s, factor in the writes
>and you should get something on the order of 18TB/s. If the
>workload can be partitioned into the register set, then there
>is a huge bandwidth advantage.
Interesting. Are the operands really 64 *bytes*, not bits ?
---------------------------
>The GPU is mostly optimized for multithreading. Most of the
>memory is in the humongous register set(s). The 5870 has 5MiB
>of registers. Each core can fetch 12 64byte operands per
>clock (16way SIMD) and there are 20 cores. So the bandwidth
>from the register set is about 13TB/s, factor in the writes
>and you should get something on the order of 18TB/s. If the
>workload can be partitioned into the register set, then there
>is a huge bandwidth advantage.
Interesting. Are the operands really 64 *bytes*, not bits ?