By: Mark Roulo (nothanks.delete@this.xxx.com), June 14, 2022 1:56 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on June 14, 2022 12:27 pm wrote:
> Bill K (bill.delete@this.gmail.com) on June 14, 2022 1:58 am wrote:
> > > Does anyone know whether Intel could have gotten acceptable yields
> > > when they were struggling with their 10nm/Intel 7 process
> > > for any part of a SOC? I'm asking: if they'd had their tile/chiplet tech
> > > done could they have had good yields for certain types of tiles?
> >
> > Yes, if the tiles were small enough. The lower the yields are, the smaller the tiles would need to
> > be.
>
> Redundancy can provide reasonable yields even with somewhat high defect rates. Even for memory arrays, the locality
> of defects/extreme variation can bias whether row/column spares or array spares are more attractive.
>
> A machine learning matrix processor could have significant defect tolerance (the cost of modestly more
> complex routing around defective components is less important for more throughput-oriented designs).
>
> Even a conventional great-big-out-of-order processor could support redundant simple functional units
> (and possibly redundancy in a multiplier array) at some cost in latency from additional routing complexity
> and many parts of the processor are N identical/very similar blocks (though the smallish N for schedulers
> increases the fraction of lost functionality and the overhead for control/routing).
>
> There may also be process variation which makes a working device too slow to be marketable, but this
> boundary is dependent on the product target. (Presumably there are also power/performance/area tradeoffs
> that can be made at design time to improve yield. Choosing a "worse" process with lower defect rates
> may usually be financially sensible rather than sacrificing PPA in design for yield, but much of the
> pricing is based on competition [if the fixed costs are taken as a loss and some of the incremental
> costs are allocated to research-and-development — assuming active production facilitates learning
> — the pricing of a broken process may be low yet the process still be made available; the pricing
> of a donation to a university research project might also be difficult to define].)
>
> (I am not an EE or financial analyst, but these seem reasonable inferences from broad principles.)
In theory one can make a design/implementation which is AMAZINGLY robust against manufacturing defects. Triply (or more) redundant everything.
Though this would likely lead to worse performance.
In practice, folks are already providing redundancy where it is practical.
Memory regions are a good example of this (standalone DRAM and FLASH as well as caches).
And NVidia's Ampere A100 GPU provides redundancy at several architectural levels.
I'm assuming that NVidia has done something with the memory so that a single bad transistor
doesn't void an entire cache...
Then the chip is manufactured with eight identical GPCs (Graphics Processing Clusters) though the A100 only claims seven enabled (much like Cell with 8 SPEs on the die but only 7 enabled).
Each GPC contains 2x8 SMs (Streaming Multiprocessors -- the NVidia equivalent of a core).
If everything was perfect there would be 8x2x8 = 128 SMs per A100.
The shipping parts contain 108 working SMs across 7 GPCs so NVidia expects to lose an entire GPC per die and then 4 more SMs from the remaining seven GPCs.
Notice that NVidia is losing a full 15% of their manufactured FLOPS just to allow the part to work.
NVidia gets away with this because the margins on the A100 chips are very high. But this is probably a bad plan for laptop and desktop chips.
> Bill K (bill.delete@this.gmail.com) on June 14, 2022 1:58 am wrote:
> > > Does anyone know whether Intel could have gotten acceptable yields
> > > when they were struggling with their 10nm/Intel 7 process
> > > for any part of a SOC? I'm asking: if they'd had their tile/chiplet tech
> > > done could they have had good yields for certain types of tiles?
> >
> > Yes, if the tiles were small enough. The lower the yields are, the smaller the tiles would need to
> > be.
>
> Redundancy can provide reasonable yields even with somewhat high defect rates. Even for memory arrays, the locality
> of defects/extreme variation can bias whether row/column spares or array spares are more attractive.
>
> A machine learning matrix processor could have significant defect tolerance (the cost of modestly more
> complex routing around defective components is less important for more throughput-oriented designs).
>
> Even a conventional great-big-out-of-order processor could support redundant simple functional units
> (and possibly redundancy in a multiplier array) at some cost in latency from additional routing complexity
> and many parts of the processor are N identical/very similar blocks (though the smallish N for schedulers
> increases the fraction of lost functionality and the overhead for control/routing).
>
> There may also be process variation which makes a working device too slow to be marketable, but this
> boundary is dependent on the product target. (Presumably there are also power/performance/area tradeoffs
> that can be made at design time to improve yield. Choosing a "worse" process with lower defect rates
> may usually be financially sensible rather than sacrificing PPA in design for yield, but much of the
> pricing is based on competition [if the fixed costs are taken as a loss and some of the incremental
> costs are allocated to research-and-development — assuming active production facilitates learning
> — the pricing of a broken process may be low yet the process still be made available; the pricing
> of a donation to a university research project might also be difficult to define].)
>
> (I am not an EE or financial analyst, but these seem reasonable inferences from broad principles.)
In theory one can make a design/implementation which is AMAZINGLY robust against manufacturing defects. Triply (or more) redundant everything.
Though this would likely lead to worse performance.
In practice, folks are already providing redundancy where it is practical.
Memory regions are a good example of this (standalone DRAM and FLASH as well as caches).
And NVidia's Ampere A100 GPU provides redundancy at several architectural levels.
I'm assuming that NVidia has done something with the memory so that a single bad transistor
doesn't void an entire cache...
Then the chip is manufactured with eight identical GPCs (Graphics Processing Clusters) though the A100 only claims seven enabled (much like Cell with 8 SPEs on the die but only 7 enabled).
Each GPC contains 2x8 SMs (Streaming Multiprocessors -- the NVidia equivalent of a core).
If everything was perfect there would be 8x2x8 = 128 SMs per A100.
The shipping parts contain 108 working SMs across 7 GPCs so NVidia expects to lose an entire GPC per die and then 4 more SMs from the remaining seven GPCs.
Notice that NVidia is losing a full 15% of their manufactured FLOPS just to allow the part to work.
NVidia gets away with this because the margins on the A100 chips are very high. But this is probably a bad plan for laptop and desktop chips.