CMP Design space

Article: ISSCC 2007: Intel's Teraflops Design
By: David Kanter (dkanter.delete@this.realworldtech.com), April 9, 2007 10:15 pm
Room: Moderated Discussions
Marcin Niewiadomski (marcin.niewiadomski@gmail.com) on 4/9/07 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 4/7/07 wrote:
>---------------------------
>>Marcin Niewiadomski (marcin.niewiadomski@gmail.com) on 4/6/07 wrote:
>>---------------------------
>>>David Kanter (dkanter@realworldtech.com) on 4/3/07 wrote:
>>>---------------------------
>>>>I'd interpret it as elements of this technology will be used in commercial products within 5 years or less.
>>>>
>>>>There's no way Intel is going to produce something based on those processing engines;
>>>>they'd at least use something that has virtual memory, caches, TLBs, coherency,
>>>>etc. So that means you have to rip out most of the tiles right there.
>>>>
>>>>They might recycle parts of the network though, or maybe just the underlying ideas.
>>>>
>>>>DK
>>>
>>>I think that Intel will only reuse the ideas (or prove them >wrong). As it was mentioned
>>>earlier, it is not possible to get 16 or more cores with >same tricks, which work for 1-8.
>>
>>Of course, the whole point is to see which ideas work out, and which ones don't.
>>
>>>In general, there are two ways - one is to have some >centralized system like it
>>>is currently implemented in Cell and modern GPUs (e.g. ATI >Xenos, pixel shaders
>>>in ATI R5x0 series, NVIDIA G80). Simply there is one "core" >responsible for management
>>and coordination of other cores.
>>
>>That is one option.
>>
>>>The other way is to have distributed system and my guess is >that Intel is investigating
>>>that area in Polaris. That's why they have a lot of very >simple cores - in my opinion
>>>they are "complex enough" for studies focusing on massive >multicore system architecture.
>>
>>There are other options. What you are basically discussing is whether the relationships
>>between the cores are peer-peer or master-slave. You could easily design a processor
>>with a combination - imagine a SoC with 4 master nodes (and each master node regards
>>the other 3 as peers), and then 16 subordinate slave nodes. Each slave is controlled
>>exclusively by a single master node.
>>
>>However, the point of the research was really not the master-slave versus peer-peer
>dynamic. The point was to explore:
>>
>>1. novel clocking
>>2. mesochronous communication
>>3. leakage power management
>>
>>Intel is able to reduce clock distribution power by using the mesochronous interfaces.
>>Unfortunately, you cannot just replace the VLIW cores with x86 processors and expect everything to do quite as well.
>>
>>It looks to me like any shared structure (L2 caches for example), will have to
>>be in the same clock domain. This means that the mesochronous clocking would have to be modified for shared caching.
>>
>>For instance, on a Woodcrest (dual core, shared LLC), you could probably make all
>>the inter-core interfaces mesochronous, and then split up your clock distribution
>>network. However, you might also need a mesochronous interface on the L2 cache,
>>and you'd have that on a separate clock domain as well. At least, it seems to me
>>like this should be feasible...perhaps someone can verify/refute this idea?
>>
>>David
>
>Thank You for detailed reply. My knowledge about low-level chip design is really
>elementary, so stuff like clock domains are really something new for me. And I'm
>always very happy to learn something new, even it means "it's so obvious that I
>should thought about it earlier"-syndrome.

Well, that's how you learn.

>I cannot say anything reasonable regarding mesochronous >clocking of shared structures

You did understand the problem I was discussing though? Polaris consists of 80 tiles which are all independent of each other. Two cores which share a cache are dependent on each other to some extent.

>- I can just recall that some Pentium 4's ALUs and G80's shader units are working
>with double frequency (compared to rest of the die), but my >guess is that same tricks
>cannot be used for mesochronous clocking.

If you have a part of the chip running at N GHz and another part at 2N GHz, it's not hard to get them to be synchronous. You'd just want to make sure you transfer data into the slower part of the chip (N GHz) at only even clock cycles on the fast part. For example:

Fast Region Slow Region
0 x x
1 x
2 x x
3 x
4 x x

etc. etc.

So you can send data between the two regions only on even numbered clock cycles. That's a pretty easy problem to solve.

With mesochronous circuits, you have to worry about running at different phases, which could result in garbage passing between the two clock regions.

DK
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ISSCC Coverage Continues: Intel's Terascale chipDavid Kanter2007/04/02 11:40 PM
  Who is right?no one2007/04/03 03:17 AM
    Who is right?David Kanter2007/04/03 09:25 AM
  ISSCC Coverage Continues: Intel's Terascale chipAnonymous12007/04/03 06:27 AM
    ISSCC Coverage Continues: Intel's Terascale chipDavid Kanter2007/04/03 09:26 AM
  ISSCC Coverage Continues: Intel's Terascale chipflamingEndian2007/04/03 11:48 AM
    ISSCC Coverage Continues: Intel's Terascale chipAnonymous12007/04/03 12:24 PM
      ISSCC Coverage Continues: Intel's Terascale chipDavid Kanter2007/04/03 12:38 PM
        ISSCC Coverage Continues: Intel's Terascale chipMarcin Niewiadomski2007/04/06 02:14 AM
          CMP Design spaceDavid Kanter2007/04/07 10:55 PM
            CMP Design spaceMarcin Niewiadomski2007/04/09 09:18 PM
              CMP Design spaceDavid Kanter2007/04/09 10:15 PM
                CMP Design spaceAnonymous12007/04/10 07:53 AM
                  Plesiosynchronous interfacesDavid Kanter2007/04/10 08:22 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?