By: --- (---.delete@this.redheron.com), July 31, 2021 6:23 pm
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on July 30, 2021 1:52 pm wrote:
> Doug S (foo.delete@this.bar.bar) on July 30, 2021 11:01 am wrote:
> > Heikki Kultala (heikki.kult.ala.delete@this.gmail.com) on July 29, 2021 11:18 pm wrote:
> > > Doug S (foo.delete@this.bar.bar) on July 29, 2021 5:44 pm wrote:
> > > > None of it really matters, since the process names have nothing to do with a physical dimension
> > > > anywhere in the design. It is just a placeholder for "2x the transistors in the next generation"
> > > > but we aren't even seeing that lately as TSMC only got 1.8x scaling on N5 and 1.7x on N3
> > > > - but TSMC wasn't calling those 5nm and 3nm, it is mostly outsiders doing so (maybe TSMC
> > > > does as well, but probably only because outsiders referred to them that way)
> > > >
> > > > Who knows what TSMC will call the stuff below N2, will it be N1.4 or P1400 or just
> > > > choose another letter at random, multiply by 10, so X14 then X10 and so on.
> > >
> > > TSMC did not get 1.8x scaling on N5. In reality (by synthesizing any
> > > reasonable piece of logic than does something) it's much worse.
> > >
> > > Or, lets say that TSMC might have gotten 1.8x for single best-case standard cell
> > > component type for their marketing materials, but TSMCs customers get MUCH LESS
> > > than 1,8x for their real-world designs that actually do something useful.
> >
> >
> > Cache scaling is not as good, so customers like Apple who added a lot of cache when going
> > to N5 get worse scaling. For N5 to N3 TSMC states logic scales at 1.7x, cache scales at 1.2x
> > and I/O scales at 1.1x. I don't recall seeing them report cache scaling for N7 to N5.
> >
> > Anyone know why cache scaling is becoming a problem? Might it have to do with congestion in the metal
> > layers? If they do can something like Intel's PowerVia and have metal sandwiching the logic, the metal
> > routing will become easier - especially for parts that will be stacked which is more and more common.
>
>
> Good question: There's a bit of an answer here:
>
> https://www.linkedin.com/pulse/cmos-density-scaling-cppmxp-metric-ali-khakifirooz/
>
> SRAM size is largely determined by fin pitch and isolation pitch.
>
> Quoting here:
>
>
>
>
>
> So bottom line, FinFETs make dense SRAM a little tricky in addition
> to the need for boosting logic to enable good read/write margins.
>
> David
It's also worth pointing out that "cache", at least for Apple, is a *lot* more sophisticated than what's in your textbooks.
(I assume something similar is true for other vendors, though likely lagging somewhat.)
What this means is that "actual" cache at any node will lag "theoretical cache" and more so every year, as actual cache adds more transistors and more wires. Where TSMC may be completely correct in claiming that can get 100 units of "textbook" cache in 1mm^2, Apple may get only 80 units of "Apple 2021 version super-low-energy" cache in the same 1mm^2.
What sorts of changes?
- split caches into very small subarrays that can be independently powered
- very careful charging of bitlines (with the voltage changing over time) to minimize energy
https://patents.google.com/patent/US10720193B2 and https://patents.google.com/patent/US20190272859A1
- separating the logic voltage from the memory storage so the logic can run at lower voltage
https://patents.google.com/patent/US7355905B2
- which expands to a variety of different things -- vary the memory storage voltage under certain conditions https://patents.google.com/patent/US20100182850A1 or add a third voltage used when writing the memory https://patents.google.com/patent/US8885393B2
- then you start adding redundancy to ensure your cache works in the face of a few manufacturing errors, like https://patents.google.com/patent/US10592367B2
(These are just at the level of a physical SRAM bank. There is, of course, the additional level of more logic added to every cache every year, but I'm ignoring that.)
I've included only a few patents to make the point; there are many more. They start in the early PA Semi days, and maintain a constant pace, with a number of innovations even as late as 2018.
I'm not in a position to calculate how much overhead all this fanciness adds. My guess is it has limited effect on the transistor budget, but noticeable effect on metal congestion.
> Doug S (foo.delete@this.bar.bar) on July 30, 2021 11:01 am wrote:
> > Heikki Kultala (heikki.kult.ala.delete@this.gmail.com) on July 29, 2021 11:18 pm wrote:
> > > Doug S (foo.delete@this.bar.bar) on July 29, 2021 5:44 pm wrote:
> > > > None of it really matters, since the process names have nothing to do with a physical dimension
> > > > anywhere in the design. It is just a placeholder for "2x the transistors in the next generation"
> > > > but we aren't even seeing that lately as TSMC only got 1.8x scaling on N5 and 1.7x on N3
> > > > - but TSMC wasn't calling those 5nm and 3nm, it is mostly outsiders doing so (maybe TSMC
> > > > does as well, but probably only because outsiders referred to them that way)
> > > >
> > > > Who knows what TSMC will call the stuff below N2, will it be N1.4 or P1400 or just
> > > > choose another letter at random, multiply by 10, so X14 then X10 and so on.
> > >
> > > TSMC did not get 1.8x scaling on N5. In reality (by synthesizing any
> > > reasonable piece of logic than does something) it's much worse.
> > >
> > > Or, lets say that TSMC might have gotten 1.8x for single best-case standard cell
> > > component type for their marketing materials, but TSMCs customers get MUCH LESS
> > > than 1,8x for their real-world designs that actually do something useful.
> >
> >
> > Cache scaling is not as good, so customers like Apple who added a lot of cache when going
> > to N5 get worse scaling. For N5 to N3 TSMC states logic scales at 1.7x, cache scales at 1.2x
> > and I/O scales at 1.1x. I don't recall seeing them report cache scaling for N7 to N5.
> >
> > Anyone know why cache scaling is becoming a problem? Might it have to do with congestion in the metal
> > layers? If they do can something like Intel's PowerVia and have metal sandwiching the logic, the metal
> > routing will become easier - especially for parts that will be stacked which is more and more common.
>
>
> Good question: There's a bit of an answer here:
>
> https://www.linkedin.com/pulse/cmos-density-scaling-cppmxp-metric-ali-khakifirooz/
>
> SRAM size is largely determined by fin pitch and isolation pitch.
>
> Quoting here:
>
>
>
>
The choice of fin pitch, however, also determines the isolation pitch. With a typical practice
> of printing a sea of fins at constant pitch and removing unwanted ones, the minimum isolation
> pitch is equal to twice the fin pitch, or roughly 1.5× the minimum metal pitch.
>
> As seen in Figure 3, isolation pitch already lagged behind CPP and MxP in the most recent
> planar technologies. The introduction of FinFET technology simply guarantees that it will
> always lag behind. Since the isolation pitch determines the SRAM cell size, such a practice
> produces SRAM bitcells larger than what is doable with a certain lithography.
>
> A solution is to print fins at different pitch when needed. In fact the smallest SRAM
> cells reported so far all printed the fins at a pitch larger than that used in logic
> and avoided the need to “remove-every-other-fin” principle. In SADP, one can simply
> print mandrels somewhat wider and spaced further away from those in logic.
>
> Of course, one concern is the non-uniformity of the resulting fin width [7]. An extension of
> this approach to SAQP is a bit more complicated [8], but worth considering, given the fact that
> even “remove-every-other-fin” will require double patterning. As a side note, I should mention
> that the width of the isolation region is also limited by the requirement to place gate contacts
> outside the active region and innovations are needed to reduce this area penalty.
>
>
> So bottom line, FinFETs make dense SRAM a little tricky in addition
> to the need for boosting logic to enable good read/write margins.
>
> David
It's also worth pointing out that "cache", at least for Apple, is a *lot* more sophisticated than what's in your textbooks.
(I assume something similar is true for other vendors, though likely lagging somewhat.)
What this means is that "actual" cache at any node will lag "theoretical cache" and more so every year, as actual cache adds more transistors and more wires. Where TSMC may be completely correct in claiming that can get 100 units of "textbook" cache in 1mm^2, Apple may get only 80 units of "Apple 2021 version super-low-energy" cache in the same 1mm^2.
What sorts of changes?
- split caches into very small subarrays that can be independently powered
- very careful charging of bitlines (with the voltage changing over time) to minimize energy
https://patents.google.com/patent/US10720193B2 and https://patents.google.com/patent/US20190272859A1
- separating the logic voltage from the memory storage so the logic can run at lower voltage
https://patents.google.com/patent/US7355905B2
- which expands to a variety of different things -- vary the memory storage voltage under certain conditions https://patents.google.com/patent/US20100182850A1 or add a third voltage used when writing the memory https://patents.google.com/patent/US8885393B2
- then you start adding redundancy to ensure your cache works in the face of a few manufacturing errors, like https://patents.google.com/patent/US10592367B2
(These are just at the level of a physical SRAM bank. There is, of course, the additional level of more logic added to every cache every year, but I'm ignoring that.)
I've included only a few patents to make the point; there are many more. They start in the early PA Semi days, and maintain a constant pace, with a number of innovations even as late as 2018.
I'm not in a position to calculate how much overhead all this fanciness adds. My guess is it has limited effect on the transistor budget, but noticeable effect on metal congestion.