Article: Parallelism at HotPar 2010
By: Michael S (already5chosen.delete@this.yahoo.com), August 5, 2010 10:02 am
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto@gmail.com) on 8/5/10 wrote:
---------------------------
>Richard Cownie (tich@pobox.com) on 8/5/10 wrote:
>---------------------------
>>If you're single-threaded, then you get the benefit of
>>the higher clock speed with TurboBoost. So you win
>>that way.
>
>Yes and I pointed out to this in a previous post saying that more often than not
>Nehalem manages to slightly surpass Penryn because of its effective higher clock frequency.
>
>>It seems to me that Nehalem really covers all the bases
>>quite well.
>
>Yes and I never said otherwise but a C2D with an IMC would have probably fared
>better in single/lightly threaded applications. The whole point was that trade-offs
>were made in Nehalem to improve throughput which came at a slight cost in single-threaded
>performance. That's why there are several applications were - in spite of much lower
>memory latency, higher bandwidth and higher clock - Nehalem matches or even loses to Penryn.
>
>>There are a lot of single-threaded apps which benefit
>>from the much lower latency of DRAM accesses in Nehalem
>>systems. And didn't they reduce the L2 latency as well ?
>
>Yes but they also made it significantly smaller, exclusive to a core (which penalizes
>lightly threaded applications with significant data sharing) and the increased L1 latency from 3 to 4 cycles.
>
>>If you've got a particular example in mind of an app
>>that goes better on Core2 than on Nehalem, tell us
>>what it is and give us some figures. Otherwise there's
>>not much to this nitpicking.
>
>Michael S already pointed out to FPGA P&R but you can pick all sort of examples
>from Nehalem reviews, here are just a couple:
>
Well, that's not what I was saying.
In FPGA p&r Nehalem is faster than Penryn clock4clock, I'd gueas mostly due to Turboboost. In my measurements i7-920 (2.66 GHz) was almost as fast E8400 (3.0 GHz Penryn). The point I [obviously unsuccessfully] was trying to make was that the difference between Nehalem and Penryn was significantly smaller than the difference between Penryn and Merom, despite the later two sharing nearly identical microarchitecture and being distinguished only by size of L2 cache and the speed of FSB and chipset.
---------------------------
>Richard Cownie (tich@pobox.com) on 8/5/10 wrote:
>---------------------------
>>If you're single-threaded, then you get the benefit of
>>the higher clock speed with TurboBoost. So you win
>>that way.
>
>Yes and I pointed out to this in a previous post saying that more often than not
>Nehalem manages to slightly surpass Penryn because of its effective higher clock frequency.
>
>>It seems to me that Nehalem really covers all the bases
>>quite well.
>
>Yes and I never said otherwise but a C2D with an IMC would have probably fared
>better in single/lightly threaded applications. The whole point was that trade-offs
>were made in Nehalem to improve throughput which came at a slight cost in single-threaded
>performance. That's why there are several applications were - in spite of much lower
>memory latency, higher bandwidth and higher clock - Nehalem matches or even loses to Penryn.
>
>>There are a lot of single-threaded apps which benefit
>>from the much lower latency of DRAM accesses in Nehalem
>>systems. And didn't they reduce the L2 latency as well ?
>
>Yes but they also made it significantly smaller, exclusive to a core (which penalizes
>lightly threaded applications with significant data sharing) and the increased L1 latency from 3 to 4 cycles.
>
>>If you've got a particular example in mind of an app
>>that goes better on Core2 than on Nehalem, tell us
>>what it is and give us some figures. Otherwise there's
>>not much to this nitpicking.
>
>Michael S already pointed out to FPGA P&R but you can pick all sort of examples
>from Nehalem reviews, here are just a couple:
>
Well, that's not what I was saying.
In FPGA p&r Nehalem is faster than Penryn clock4clock, I'd gueas mostly due to Turboboost. In my measurements i7-920 (2.66 GHz) was almost as fast E8400 (3.0 GHz Penryn). The point I [obviously unsuccessfully] was trying to make was that the difference between Nehalem and Penryn was significantly smaller than the difference between Penryn and Merom, despite the later two sharing nearly identical microarchitecture and being distinguished only by size of L2 cache and the speed of FSB and chipset.