Article: Parallelism at HotPar 2010
By: Gabriele Svelto (gabriele.svelto.delete@this.gmail.com), August 5, 2010 7:41 am
Room: Moderated Discussions
Richard Cownie (tich@pobox.com) on 8/5/10 wrote:
---------------------------
>If you're single-threaded, then you get the benefit of
>the higher clock speed with TurboBoost. So you win
>that way.
Yes and I pointed out to this in a previous post saying that more often than not Nehalem manages to slightly surpass Penryn because of its effective higher clock frequency.
>It seems to me that Nehalem really covers all the bases
>quite well.
Yes and I never said otherwise but a C2D with an IMC would have probably fared better in single/lightly threaded applications. The whole point was that trade-offs were made in Nehalem to improve throughput which came at a slight cost in single-threaded performance. That's why there are several applications were - in spite of much lower memory latency, higher bandwidth and higher clock - Nehalem matches or even loses to Penryn.
>There are a lot of single-threaded apps which benefit
>from the much lower latency of DRAM accesses in Nehalem
>systems. And didn't they reduce the L2 latency as well ?
Yes but they also made it significantly smaller, exclusive to a core (which penalizes lightly threaded applications with significant data sharing) and the increased L1 latency from 3 to 4 cycles.
>If you've got a particular example in mind of an app
>that goes better on Core2 than on Nehalem, tell us
>what it is and give us some figures. Otherwise there's
>not much to this nitpicking.
Michael S already pointed out to FPGA P&R but you can pick all sort of examples from Nehalem reviews, here are just a couple:
- http://www.anandtech.com/show/2658/18 Penryn beating Nehalem on iTunes encoding
- http://www.anandtech.com/show/2658/19 Penryn beating Nehalem on GRID and Crysis
If you dig up the first reviews were C2D/Q were still being tested you will find more examples of this. Anyway let's make a point clear: I'm not debating if Nehalem is a good processor or not. It has *significantly* higher multi-threaded performance than every other processor (except for POWER7) and very good single-threaded performance too.
But compromises were made for this to be possible which was the point of my first post. Perfectly legitimate compromises from a market perspective considering that Intel wanted to regain and hold the precious market share it lost in the server space. However those compromises slightly degraded its clock-per-clock, thread-per-thread performance compared to Penryn, something it usually makes up with turbo-boost. And I'm convinced that an hypothetical C2D with an IMC and turbo-boost capability would have fared much better in most lightly threaded cases than Nehalem.
---------------------------
>If you're single-threaded, then you get the benefit of
>the higher clock speed with TurboBoost. So you win
>that way.
Yes and I pointed out to this in a previous post saying that more often than not Nehalem manages to slightly surpass Penryn because of its effective higher clock frequency.
>It seems to me that Nehalem really covers all the bases
>quite well.
Yes and I never said otherwise but a C2D with an IMC would have probably fared better in single/lightly threaded applications. The whole point was that trade-offs were made in Nehalem to improve throughput which came at a slight cost in single-threaded performance. That's why there are several applications were - in spite of much lower memory latency, higher bandwidth and higher clock - Nehalem matches or even loses to Penryn.
>There are a lot of single-threaded apps which benefit
>from the much lower latency of DRAM accesses in Nehalem
>systems. And didn't they reduce the L2 latency as well ?
Yes but they also made it significantly smaller, exclusive to a core (which penalizes lightly threaded applications with significant data sharing) and the increased L1 latency from 3 to 4 cycles.
>If you've got a particular example in mind of an app
>that goes better on Core2 than on Nehalem, tell us
>what it is and give us some figures. Otherwise there's
>not much to this nitpicking.
Michael S already pointed out to FPGA P&R but you can pick all sort of examples from Nehalem reviews, here are just a couple:
- http://www.anandtech.com/show/2658/18 Penryn beating Nehalem on iTunes encoding
- http://www.anandtech.com/show/2658/19 Penryn beating Nehalem on GRID and Crysis
If you dig up the first reviews were C2D/Q were still being tested you will find more examples of this. Anyway let's make a point clear: I'm not debating if Nehalem is a good processor or not. It has *significantly* higher multi-threaded performance than every other processor (except for POWER7) and very good single-threaded performance too.
But compromises were made for this to be possible which was the point of my first post. Perfectly legitimate compromises from a market perspective considering that Intel wanted to regain and hold the precious market share it lost in the server space. However those compromises slightly degraded its clock-per-clock, thread-per-thread performance compared to Penryn, something it usually makes up with turbo-boost. And I'm convinced that an hypothetical C2D with an IMC and turbo-boost capability would have fared much better in most lightly threaded cases than Nehalem.