Article: Parallelism at HotPar 2010
By: Richard Cownie (tich.delete@this.pobox.com), August 3, 2010 10:42 am
Room: Moderated Discussions
Mark Roulo (nothanks@xxx.com) on 8/3/10 wrote:
---------------------------
>~10 GB/sec for a Nehalem core seems suspiciously low to me. Assume a 2.5 GHz clock,
>then the cache can only move 4 bytes per clock in or out of the registers. Really?
>
>-Mark Roulo
Yeah, I'm not sure quite what that number is. Maybe it's
really DRAM bandwidth. Or maybe it's cache bandwidth, but
using some code written in a fairly dumb way.
The number probably ought to be 4x higher ? And I saw
another benchmark claiming about 40GB/s, which seems more
plausible. But then 1TB/40GB would give you 25x, which
is still a pretty high factor.
Anyway, I think I'm basically agreeing with you that
current GPUs shouldn't have a 100x advantage over decent
multi-core code running on the best current CPU's. But
that it might have been somewhat plausible for comparing
against older CPU's with their weaker cache hierarchy.
---------------------------
>~10 GB/sec for a Nehalem core seems suspiciously low to me. Assume a 2.5 GHz clock,
>then the cache can only move 4 bytes per clock in or out of the registers. Really?
>
>-Mark Roulo
Yeah, I'm not sure quite what that number is. Maybe it's
really DRAM bandwidth. Or maybe it's cache bandwidth, but
using some code written in a fairly dumb way.
The number probably ought to be 4x higher ? And I saw
another benchmark claiming about 40GB/s, which seems more
plausible. But then 1TB/40GB would give you 25x, which
is still a pretty high factor.
Anyway, I think I'm basically agreeing with you that
current GPUs shouldn't have a 100x advantage over decent
multi-core code running on the best current CPU's. But
that it might have been somewhat plausible for comparing
against older CPU's with their weaker cache hierarchy.