Article: Westmere Arrives
By: ? (0xe2.0x9a.0x9b.delete@this.gmail.com), March 20, 2010 3:18 am
Room: Moderated Discussions
Vincent Diepeveen (diep@xs4all.nl) on 3/17/10 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 3/17/10 wrote:
>---------------------------
>>I just finished the first of two articles on Westmere, the 32nm, 6-core shrink of
>>Nehalem. This covers the improvements, including new instructions, minor microarchitectural
>>tweaks and some basics on the products that are available:
>>http://www.realworldtech.com/page.cfm?ArticleID=RWT031710140138
>>
>>The second piece will be a review that actually includes performance data. I'm
>>still gathering the data, but this should be a nice short read.
>>
>>David
>>
>
>It is a very fast chip. For Diep if we look at performance numbers from 3.33Ghz clocked 980 part:
>
>http://www.lostcircuits.com/mambo//index.php?option=com_content&task=view&id=79&Itemid=1&limit=1&limitstart=17
A sentence in the Lost Circuits article caught my attention:
"Once again, we ran Gulftown with and without TurboBoost and, compliment to Vincent Diepeveen, that's what scaling should look like."
Reading this sentence, it seems to be implying that such perfect scaling is possible for any algorithm: given enough time, any algorithm can be reprogrammed to scale perfectly.
While it actually is the truth (that any algorithm can be reprogrammed to scale perfectly), it does *not* tell us anything about the *speed* of computation.
Actually, it is pretty easy to create a program that will take *any* source code X and make it scale perfectly up to say 16 cores. Here is what it would look like:
1. On a 16-core machine: Run X on 1 core. The other 15 cores are executing an infinite loop until the 1st core finishes executing X.
2. On a 8-core machine: Run X on 1 core. The other 7 cores are executing an infinite loop until the 1st core finishes executing X. Then, scrap the results and rerun X on the 1st core again! Again the other 7 cores are waiting.
3. On a 4-core machine: 4 times do (Run X on 1 core. The other 3 cores are executing an infinite loop until the 1st core finishes executing X. Scrap results, unless it is the last run.)
4. On a 2-core machine: ...
5. On a 1-core machine: 16 times do (Run X on the 1 core. Scrap results, unless it is the last run.)
Now, let's look at the scaling of these 5 cases:
1 core: 16*T seconds
2 cores: 8*T seconds
4 cores: 4*T seconds
8 cores: 2*T seconds
16 cores: T seconds
So, as you can see, perfect scaling. And it can even be done in a fully automated way. No problem at all.
So, what does the fact that Diep scales perfectly tell us about the actually quality of Diep: probably nothing!
A brain-dead monkey in coma barely able to push a button would be able to make any algorithm scale perfectly ...
---------------------------
>David Kanter (dkanter@realworldtech.com) on 3/17/10 wrote:
>---------------------------
>>I just finished the first of two articles on Westmere, the 32nm, 6-core shrink of
>>Nehalem. This covers the improvements, including new instructions, minor microarchitectural
>>tweaks and some basics on the products that are available:
>>http://www.realworldtech.com/page.cfm?ArticleID=RWT031710140138
>>
>>The second piece will be a review that actually includes performance data. I'm
>>still gathering the data, but this should be a nice short read.
>>
>>David
>>
>
>It is a very fast chip. For Diep if we look at performance numbers from 3.33Ghz clocked 980 part:
>
>http://www.lostcircuits.com/mambo//index.php?option=com_content&task=view&id=79&Itemid=1&limit=1&limitstart=17
A sentence in the Lost Circuits article caught my attention:
"Once again, we ran Gulftown with and without TurboBoost and, compliment to Vincent Diepeveen, that's what scaling should look like."
Reading this sentence, it seems to be implying that such perfect scaling is possible for any algorithm: given enough time, any algorithm can be reprogrammed to scale perfectly.
While it actually is the truth (that any algorithm can be reprogrammed to scale perfectly), it does *not* tell us anything about the *speed* of computation.
Actually, it is pretty easy to create a program that will take *any* source code X and make it scale perfectly up to say 16 cores. Here is what it would look like:
1. On a 16-core machine: Run X on 1 core. The other 15 cores are executing an infinite loop until the 1st core finishes executing X.
2. On a 8-core machine: Run X on 1 core. The other 7 cores are executing an infinite loop until the 1st core finishes executing X. Then, scrap the results and rerun X on the 1st core again! Again the other 7 cores are waiting.
3. On a 4-core machine: 4 times do (Run X on 1 core. The other 3 cores are executing an infinite loop until the 1st core finishes executing X. Scrap results, unless it is the last run.)
4. On a 2-core machine: ...
5. On a 1-core machine: 16 times do (Run X on the 1 core. Scrap results, unless it is the last run.)
Now, let's look at the scaling of these 5 cases:
1 core: 16*T seconds
2 cores: 8*T seconds
4 cores: 4*T seconds
8 cores: 2*T seconds
16 cores: T seconds
So, as you can see, perfect scaling. And it can even be done in a fully automated way. No problem at all.
So, what does the fact that Diep scales perfectly tell us about the actually quality of Diep: probably nothing!
A brain-dead monkey in coma barely able to push a button would be able to make any algorithm scale perfectly ...
Topic | Posted By | Date |
---|---|---|
Westmere Launch article | David Kanter | 2010/03/17 01:27 PM |
Gulftown??? (NT) | MS | 2010/03/17 04:04 PM |
Gulftown??? | David Kanter | 2010/03/17 05:28 PM |
Gulftown??? | MS | 2010/03/17 06:24 PM |
Westmere Launch article | Vincent Diepeveen | 2010/03/17 04:14 PM |
Westmere Launch article | ? | 2010/03/20 03:18 AM |
Westmere Launch article (NT) | Matt Sayler | 2010/03/20 06:55 AM |
Westmere Launch article | MS | 2010/03/21 07:12 AM |
Westmere Launch article | ? | 2010/03/21 10:06 PM |
Westmere Launch article | MS | 2010/03/22 05:38 PM |
Westmere Launch article | anonymous | 2010/03/17 11:02 PM |
Westmere Launch article | David Kanter | 2010/03/18 12:21 AM |
Westmere Launch article | Rohit | 2010/03/18 01:40 AM |
Westmere Launch article | a reader | 2010/03/18 09:09 PM |
Westmere Launch article | David Kanter | 2010/03/18 09:30 PM |
Westmere Launch article | a reader | 2010/03/18 09:46 PM |
Westmere Launch article | David Kanter | 2010/03/19 09:39 AM |
Westmere Launch article | Rohit | 2010/03/19 11:16 AM |
Westmere Launch article | David Kanter | 2010/03/19 11:47 AM |
Westmere Launch article | a reader | 2010/03/19 07:55 PM |
Westmere Launch article | David Kanter | 2010/03/19 08:58 PM |
Westmere Launch article | a reader | 2010/03/20 08:23 AM |