Article: Parallelism at HotPar 2010
By: Gabriele Svelto (gabriele.svelto.delete@this.gmail.com), August 4, 2010 10:51 pm
Room: Moderated Discussions
Richard Cownie (tich@pobox.com) on 8/4/10 wrote:
---------------------------
>I daresay there are particular examples for which that
>is true. But my own experience with a big app is
>completely the opposite: just a couple of weeks ago
>I ran the exact same executable on a Core2 Xeon 2.93GHz
>and a Nehalem Xeon 2.93GHz, and got 1.51x speedup.
>And these are still 45nm parts without the TurboBoost
>trick.
>
>That's a pure single-threaded app, so there's no benefit
>from the hyperthreading.
>
>It seems like a really big win. And I get the impression
>that most people see Nehalem that way.
>
>You're welcome to have your opinion, based on your own
>experience. But I don't think it matches what most
>people have measured.
No, that speed-up doesn't surprise me, I've seen something similar on applications for which C2D FSB was a bottleneck. On top of that if the older system you used was based on FB-DIMM you probably picked the case where Nehalem would have the higher advantage from a memory subsystem POV compared to C2D both in bandwidth and latency. However that doesn't change the fact that in single-threaded applications which are not memory limited Nehalem seems to offer very little improvement in terms of per-clock performance over Penryn.
---------------------------
>I daresay there are particular examples for which that
>is true. But my own experience with a big app is
>completely the opposite: just a couple of weeks ago
>I ran the exact same executable on a Core2 Xeon 2.93GHz
>and a Nehalem Xeon 2.93GHz, and got 1.51x speedup.
>And these are still 45nm parts without the TurboBoost
>trick.
>
>That's a pure single-threaded app, so there's no benefit
>from the hyperthreading.
>
>It seems like a really big win. And I get the impression
>that most people see Nehalem that way.
>
>You're welcome to have your opinion, based on your own
>experience. But I don't think it matches what most
>people have measured.
No, that speed-up doesn't surprise me, I've seen something similar on applications for which C2D FSB was a bottleneck. On top of that if the older system you used was based on FB-DIMM you probably picked the case where Nehalem would have the higher advantage from a memory subsystem POV compared to C2D both in bandwidth and latency. However that doesn't change the fact that in single-threaded applications which are not memory limited Nehalem seems to offer very little improvement in terms of per-clock performance over Penryn.