Article: Parallelism at HotPar 2010
By: Mark Roulo (nothanks.delete@this.xxx.com), August 3, 2010 8:19 am
Room: Moderated Discussions
anon (anon@anon.com) on 8/2/10 wrote:
---------------------------
>Also, the comparisons they are making appears to be against their old CPU cluster.
>Seeing as they bought a first GPU cluster in Jan 2008 and do upgrades about once
>per year, the comparison would probably be on early 2007 era CPUs, quite likely dual core Opterons.
>
The experience we have had at my company is that SSSE3 (note the 3rd S ...) is worth about a 2x speedup for our vector loads. AMD chips do not support SSSE3, so *IF* our experience generalizes, then you can get an extra 2x slowdown vs. the GPU by using SSE but using AMD chips.
Of course, back in 2008, they wouldn't have Fermi GPUs, either, so these might cancel out. Pre-Nehalem Intel CPUs had other problems involving becoming bandwidth limited very quickly, too. I'd suggest that GPUs had a larger *relative* advantage over CPUs before Nehalem because:
1) AMD didn't support SSSE3, but did have more reasonable bandwidth (and the B/W scaled as chips were added), but
2) Intel had SSSE3 (at least with the Core family), but had poor bandwidth.
You really want both, and Nehalem finally delivered.
In any event, to evaluate a claim, I kinda need to know:
*) CPU type and count,
*) Code is single threaded or not,
*) Code uses vector instructions (and if so, up to what version)
The comp.arch paper/presentation doesn't provide this, although the MPI reference (page 16, I think) *does* suggest that they use all the cores.
-Mark Roulo
---------------------------
>Also, the comparisons they are making appears to be against their old CPU cluster.
>Seeing as they bought a first GPU cluster in Jan 2008 and do upgrades about once
>per year, the comparison would probably be on early 2007 era CPUs, quite likely dual core Opterons.
>
The experience we have had at my company is that SSSE3 (note the 3rd S ...) is worth about a 2x speedup for our vector loads. AMD chips do not support SSSE3, so *IF* our experience generalizes, then you can get an extra 2x slowdown vs. the GPU by using SSE but using AMD chips.
Of course, back in 2008, they wouldn't have Fermi GPUs, either, so these might cancel out. Pre-Nehalem Intel CPUs had other problems involving becoming bandwidth limited very quickly, too. I'd suggest that GPUs had a larger *relative* advantage over CPUs before Nehalem because:
1) AMD didn't support SSSE3, but did have more reasonable bandwidth (and the B/W scaled as chips were added), but
2) Intel had SSSE3 (at least with the Core family), but had poor bandwidth.
You really want both, and Nehalem finally delivered.
In any event, to evaluate a claim, I kinda need to know:
*) CPU type and count,
*) Code is single threaded or not,
*) Code uses vector instructions (and if so, up to what version)
The comp.arch paper/presentation doesn't provide this, although the MPI reference (page 16, I think) *does* suggest that they use all the cores.
-Mark Roulo