CMP Design space

Article: ISSCC 2007: Intel's Teraflops Design
By: David Kanter (, April 7, 2007 10:55 pm
Room: Moderated Discussions
Marcin Niewiadomski ( on 4/6/07 wrote:
>David Kanter ( on 4/3/07 wrote:
>>I'd interpret it as elements of this technology will be used in commercial products within 5 years or less.
>>There's no way Intel is going to produce something based on those processing engines;
>>they'd at least use something that has virtual memory, caches, TLBs, coherency,
>>etc. So that means you have to rip out most of the tiles right there.
>>They might recycle parts of the network though, or maybe just the underlying ideas.
>I think that Intel will only reuse the ideas (or prove them >wrong). As it was mentioned
>earlier, it is not possible to get 16 or more cores with >same tricks, which work for 1-8.

Of course, the whole point is to see which ideas work out, and which ones don't.

>In general, there are two ways - one is to have some >centralized system like it
>is currently implemented in Cell and modern GPUs (e.g. ATI >Xenos, pixel shaders
>in ATI R5x0 series, NVIDIA G80). Simply there is one "core" >responsible for management and coordination of other cores.

That is one option.

>The other way is to have distributed system and my guess is >that Intel is investigating
>that area in Polaris. That's why they have a lot of very >simple cores - in my opinion
>they are "complex enough" for studies focusing on massive >multicore system architecture.

There are other options. What you are basically discussing is whether the relationships between the cores are peer-peer or master-slave. You could easily design a processor with a combination - imagine a SoC with 4 master nodes (and each master node regards the other 3 as peers), and then 16 subordinate slave nodes. Each slave is controlled exclusively by a single master node.

However, the point of the research was really not the master-slave versus peer-peer dynamic. The point was to explore:

1. novel clocking
2. mesochronous communication
3. leakage power management

Intel is able to reduce clock distribution power by using the mesochronous interfaces. Unfortunately, you cannot just replace the VLIW cores with x86 processors and expect everything to do quite as well.

It looks to me like any shared structure (L2 caches for example), will have to be in the same clock domain. This means that the mesochronous clocking would have to be modified for shared caching.

For instance, on a Woodcrest (dual core, shared LLC), you could probably make all the inter-core interfaces mesochronous, and then split up your clock distribution network. However, you might also need a mesochronous interface on the L2 cache, and you'd have that on a separate clock domain as well. At least, it seems to me like this should be feasible...perhaps someone can verify/refute this idea?

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ISSCC Coverage Continues: Intel's Terascale chipDavid Kanter2007/04/02 11:40 PM
  Who is right?no one2007/04/03 03:17 AM
    Who is right?David Kanter2007/04/03 09:25 AM
  ISSCC Coverage Continues: Intel's Terascale chipAnonymous12007/04/03 06:27 AM
    ISSCC Coverage Continues: Intel's Terascale chipDavid Kanter2007/04/03 09:26 AM
  ISSCC Coverage Continues: Intel's Terascale chipflamingEndian2007/04/03 11:48 AM
    ISSCC Coverage Continues: Intel's Terascale chipAnonymous12007/04/03 12:24 PM
      ISSCC Coverage Continues: Intel's Terascale chipDavid Kanter2007/04/03 12:38 PM
        ISSCC Coverage Continues: Intel's Terascale chipMarcin Niewiadomski2007/04/06 02:14 AM
          CMP Design spaceDavid Kanter2007/04/07 10:55 PM
            CMP Design spaceMarcin Niewiadomski2007/04/09 09:18 PM
              CMP Design spaceDavid Kanter2007/04/09 10:15 PM
                CMP Design spaceAnonymous12007/04/10 07:53 AM
                  Plesiosynchronous interfacesDavid Kanter2007/04/10 08:22 AM
Reply to this Topic
Body: No Text
How do you spell green?