Article: Parallelism at HotPar 2010
By: Carlie Coats (coats.delete@this.baronams.com), August 7, 2010 8:41 am
Room: Moderated Discussions
>David Kanter (dkanter@realworldtech.com) on 8/3/10 wrote:
>---------------------------
>> I get the sense that they made a deliberate decision to optimize
>> for throughput at the cost of per-core performance (whereas
>> Intel has not made that choice for mainstream parts).
Gabriele Svelto (gabriele.svelto@gmail.com) on 8/4/10 wrote:
> That's debatable, Nehalem doesn't seem to offer much improvement
> in per-core performance over Core 2 (in my experience at least)
> and where it does it's marginal and partly due to its ability
> to dynamically "out-clock" Core 2 on single/lightly threaded
> workloads. Some of the choices Intel made in Nehalem were clearly
> detrimental for that type of workloads (like split L2 caches)
> but offered significant advantages for throughput.
Richard Cownie (tich@pobox.com) 8/4/10 wrote
> I daresay there are particular examples for which that
> is true. But my own experience with a big app is
> completely the opposite: just a couple of weeks ago
I ran the exact same executable on a Core2 Xeon 2.93GHz
> and a Nehalem Xeon 2.93GHz, and got 1.51x speedup.
> And these are still 45nm parts without the TurboBoost
> trick.
>
> That's a pure single-threaded app, so there's no benefit
> from the hyperthreading.
For multi-threaded apps, we have a benchmark on a North America
scale MM5 meteorology model.
Xeon 5460 Xeon 5570 speedup for 5570
16-core 6621 sec 2727 sec 2.428
32-core 5722 sec 1859 sec 3.078
We're clearly climbing the scaling-curve for this benchmark,
but still the results are quite suggestive. And clearly
the Nehalem is not only faster, it also scales better.
FWIW.
>---------------------------
>> I get the sense that they made a deliberate decision to optimize
>> for throughput at the cost of per-core performance (whereas
>> Intel has not made that choice for mainstream parts).
Gabriele Svelto (gabriele.svelto@gmail.com) on 8/4/10 wrote:
> That's debatable, Nehalem doesn't seem to offer much improvement
> in per-core performance over Core 2 (in my experience at least)
> and where it does it's marginal and partly due to its ability
> to dynamically "out-clock" Core 2 on single/lightly threaded
> workloads. Some of the choices Intel made in Nehalem were clearly
> detrimental for that type of workloads (like split L2 caches)
> but offered significant advantages for throughput.
Richard Cownie (tich@pobox.com) 8/4/10 wrote
> I daresay there are particular examples for which that
> is true. But my own experience with a big app is
> completely the opposite: just a couple of weeks ago
I ran the exact same executable on a Core2 Xeon 2.93GHz
> and a Nehalem Xeon 2.93GHz, and got 1.51x speedup.
> And these are still 45nm parts without the TurboBoost
> trick.
>
> That's a pure single-threaded app, so there's no benefit
> from the hyperthreading.
For multi-threaded apps, we have a benchmark on a North America
scale MM5 meteorology model.
Xeon 5460 Xeon 5570 speedup for 5570
16-core 6621 sec 2727 sec 2.428
32-core 5722 sec 1859 sec 3.078
We're clearly climbing the scaling-curve for this benchmark,
but still the results are quite suggestive. And clearly
the Nehalem is not only faster, it also scales better.
FWIW.