Article: Parallelism at HotPar 2010
By: Michael S (already5chosen.delete@this.yahoo.com), August 4, 2010 11:36 pm
Room: Moderated Discussions
Richard Cownie (tich@pobox.com) on 8/4/10 wrote:
---------------------------
>Gabriele Svelto (gabriele.svelto@gmail.com) on 8/4/10 wrote:
>---------------------------
>>That's debatable, Nehalem doesn't seem to offer much improvement in per-core performance
>>over Core 2 (in my experience at least)
>
>I daresay there are particular examples for which that
>is true. But my own experience with a big app is
>completely the opposite: just a couple of weeks ago
>I ran the exact same executable on a Core2 Xeon 2.93GHz
>and a Nehalem Xeon 2.93GHz, and got 1.51x speedup.
>And these are still 45nm parts without the TurboBoost
>trick.
>
Nehalem-based quad-core Xeon/2.93 without turboboost? There is no such thing. Assuming you are talking about x5570, it has Max Turbo Frequency=3.333 GHz.
http://ark.intel.com/Product.aspx?id=37111
As to Core2 Xeons, I am not aware of 2.93GHz parts. There are 2.83 GHz, 3.0 GHz, 3.16 GHz and 3.20 GHz.
Also one should pay attention to bus speed - 1333MT/s vs 1600MT/s. For big memory application like yours X5472 (3.0 GHz, 1600 MT/s) could be significantly faster than X5460 (3.16 GHz, 1333 MT/s).
Another factor is a generation - 65nm (Merom) Xeons have 1.5x smaller L2 cache than 45nm (Penryn) Xeons. That's alone could occasionally make a big impact. In particular, I saw >20% difference in FPGA p&r.
>That's a pure single-threaded app, so there's no benefit
>from the hyperthreading.
>
>It seems like a really big win. And I get the impression
>that most people see Nehalem that way.
>
>You're welcome to have your opinion, based on your own
>experience. But I don't think it matches what most
>people have measured.
>
My experience is similar to that of Gabriele - on average in single-threaded applications Nehalem is not faster than Penryn.
Nehalem's weakest spot are applications that spend most of the time in integer loops with low L2$ miss rates - in such situations Penryn commonly outperform Nehalem by 10% and more clock4clock. On scalar FP code with low L2$ miss rates they are about even. On SIMD Nehalem tends to be slightly faster. Of course, when you depend on off-chip memory latency then Nehalem is significantly faster. I'd imagine that the Nehalem's advantage is bigger yet when the bottleneck is memory bandwidth, but it seems there are very few single-threaded applications like that, since for a single thread even 7-10 GB/s, as available on C2D/C2Q Xeons, is plenty.
---------------------------
>Gabriele Svelto (gabriele.svelto@gmail.com) on 8/4/10 wrote:
>---------------------------
>>That's debatable, Nehalem doesn't seem to offer much improvement in per-core performance
>>over Core 2 (in my experience at least)
>
>I daresay there are particular examples for which that
>is true. But my own experience with a big app is
>completely the opposite: just a couple of weeks ago
>I ran the exact same executable on a Core2 Xeon 2.93GHz
>and a Nehalem Xeon 2.93GHz, and got 1.51x speedup.
>And these are still 45nm parts without the TurboBoost
>trick.
>
Nehalem-based quad-core Xeon/2.93 without turboboost? There is no such thing. Assuming you are talking about x5570, it has Max Turbo Frequency=3.333 GHz.
http://ark.intel.com/Product.aspx?id=37111
As to Core2 Xeons, I am not aware of 2.93GHz parts. There are 2.83 GHz, 3.0 GHz, 3.16 GHz and 3.20 GHz.
Also one should pay attention to bus speed - 1333MT/s vs 1600MT/s. For big memory application like yours X5472 (3.0 GHz, 1600 MT/s) could be significantly faster than X5460 (3.16 GHz, 1333 MT/s).
Another factor is a generation - 65nm (Merom) Xeons have 1.5x smaller L2 cache than 45nm (Penryn) Xeons. That's alone could occasionally make a big impact. In particular, I saw >20% difference in FPGA p&r.
>That's a pure single-threaded app, so there's no benefit
>from the hyperthreading.
>
>It seems like a really big win. And I get the impression
>that most people see Nehalem that way.
>
>You're welcome to have your opinion, based on your own
>experience. But I don't think it matches what most
>people have measured.
>
My experience is similar to that of Gabriele - on average in single-threaded applications Nehalem is not faster than Penryn.
Nehalem's weakest spot are applications that spend most of the time in integer loops with low L2$ miss rates - in such situations Penryn commonly outperform Nehalem by 10% and more clock4clock. On scalar FP code with low L2$ miss rates they are about even. On SIMD Nehalem tends to be slightly faster. Of course, when you depend on off-chip memory latency then Nehalem is significantly faster. I'd imagine that the Nehalem's advantage is bigger yet when the bottleneck is memory bandwidth, but it seems there are very few single-threaded applications like that, since for a single thread even 7-10 GB/s, as available on C2D/C2Q Xeons, is plenty.