manufacturing/packaging of the M1 Ultra

By: --- (, April 10, 2022 11:26 am
Room: Moderated Discussions
Packaging is something I don't know much about, but I've started reading some of the Apple patents in this are, and put together this summary of one that seems especially interesting and probably covers the manufacturing of the M1 Ultra. I've tried to lay out the logic of how Apple got to this point, and the substantial advantages they have from the packaging scheme they are using (because it's built on truly mass manufacturing, rather than the specialized techniques of EMIB, AMD's interposer, and so on).

I'd appreciate any corrections or clarifications. I think most of this material is very new to most of us, but hopefully we can put together a corpus of valid knowledge.


Fairly early in 2021 I knew that Apple had patents on various types of chip packaging techniques, but, like everyone, I had no idea of their relevance. With M1 Ultra the (initial) story is a little more clear, so let's examine some of these.

Think about chip packaging. You probably remember the days of 70s packaging where the silicon chip was embedded in a (much larger) black organic resin package with a few large pins on the side of the package. Wire bonding was used to solder small wires from appropriate metal pads on the actual chip to the large wires on the side of the package.

This was followed by flip-chip packaging, also known as C4, also known as BGA. All these are slightly different in details, but they're all the same sort of idea. You can see a description of the process here

The flip chip transition achieves four goals:
- the package becomes smaller (much less space take up by resin, now the board area of the package is not much larger than the area of the actual silicon)
- many more pins are available (because the contacts are smaller, and we have the whole bottom face of the package available, not just the sides)
- we waste much less power (because larger contacts require more energy to transition between states)
- we can toggle smaller contacts faster than larger contacts so can support higher frequency communication through the pins.

For years flip chip was state of the art, but continue to people want to improve these four dimensions. The next step was various forms of 2.5D packaging.
Imagine that we
- remove the solder balls and the rest of the flip chip package
- replace the PCB on which chips are mounted with silicon in which we have embedded an RDL (redistribution layer)
- we mount the chip directly on that silicon (called an interposer)
This gives us various improvements because we lose the volume of the solder balls, the RDL can be printed much more finely (thinner lines, smaller contacts) on the silicon interposer, and two or more chips mounted in this way can be placed even closer together.

At this point things vary out into many different technical options, but we'll follow the strand that is most relevant to Apple.
This starts with something called Fan-Out packaging. The idea here is we want to give a small chip a lot of connections to the outside world. So we mount the chip on an interposer that can fan out the very dense set of connections at the base of the chip, through the RDL, to a less dense set of connections that can be directed to the outside world via flip chip or something similar.

Now the point of interest is that the way FOWLP is implemented is somewhat remarkable. The initial silicon wafer is diced, the chips are tested, the good chips are precisely positioned on a carrier wafer, and a molding compound is flowed around them to lock them into place, creating a "fake" wafer. This fake wafer can then be treated like a real wafer in that it can be passed through the BEOL (back end of line) stages of a fab, which place successive layers of metal on the fake wafer just like a normal wafer! This was state of the art as of around 2016 (A10 shipped with FOWLP as a big deal).

So for the purposes of FOWLP the idea is to put each good chip on the carrier wafer fairly widely separated (so there's a lot of molding compound between each chip) and then build the RDL on top the fake wafer, with a via layer (like M0, dense vertical connections) sitting directly above the chip, connection to M1, M2, etc as the metal routing layers which are mostly over the molding compound, and which deliver the signals to a much less dense set of connections covering the area of both the chip and the molding compound.
In other words we have as essential ideas
- make a a fake wafer
- with widely separated chips
- use standard BEOL on top of that fake wafer
- with the end result of "fanning out" the density of connections.

At this point we are ready to understand the cleverness of (2018) Wafer reconstitution and die-stitching!

Suppose you have all the elements of this scheme in place, but imagine a new use for it.
We still create a fake wafer by precisely positioning known good dies. But now we position them extremely close together, we are no longer interested in fan-out.
We again have a single fake wafer on which we can use all our BEOL technology to create a set of metal routing layers.
But if the chips are very close together, we can now create routing layers that communicate information from one chip to another!
We can now achieve a few different things.
- firstly to some extent we improve yield. Rather than creating a single large chip, we can tie together some number of known-good chips, and the connection is more or less the same quality of wiring (the same standard TSMC BEOL) as on a single large chip!
Silicon interposers, EMIB, and such like are nice, but having the genuine BEOL is as good as it gets for creating dense, low-power wiring.
- secondly (I think this is true) you don't have to line up each chip in its exact "source" position when the chip was fabricated, you can stagger them by half a chip size vertically and/or horizontally. This means even starting with fairly large chips, you can now create connections between then to tie together as many as you want. As long as you get the geometry correct, and can fit your RDL as a repeating unit within the reticle limit, you can go wild and create Cerebras-level insanity!

Note also that there is nothing in this scheme that requires the different dies to be identical (as in the M1 Ultra). They could, in principle, be different chiplets from different fabs using different processes. You get the benefits of chiplets (eg use of different processes optimized for different tasks) while paying almost none of the costs (either economic, in the interposer/EMIB packaging, or technical, in higher energy/lower frequency connections between the chiplets)!

TSMC has a number of technologies that, once you understand them, seem somewhat similar to this. These have names like InFO-oS and InFO-LSI. There's some coverage of these all here:
To my eyes the scheme described by Apple looks closest to what's being called InFO-R.
EMIB and the InFO-LSI scheme try to provide a denser network of connections from the base silicon to the RDL. The Apple patent seems uninterested in that, suggesting instead that the required dense network can be formed in the base chips that are joined together in the fake wafer. This difference may be an issue of mix-and-match. If Apple control the entire design, they can create the necessary dense network on their base chiplets, whereas if chiplets from different companies are being used together this is probably less feasible.

An interesting point is that if the RDL is not especially demanding, it can be fabricated in the back-end of an older fab, maybe a 65 or 90µm fab. Even if the RDL is demanding, as I just mentioned, these most demanding parts can be built in a leading edge fab, while the base chips are being constructed, with the rest of the RDL done later, at the fake wafer stage in an older fab.

The patent includes one further technical detail, which is the actual real content of the patent, all this previous material was background.
The issue is how the space between the separate chips is filled in when the fake wafer is created. I've said that the most basic solution, when fan-out is the goal, is to fill the space with molding compound (ie an organic resin). Now, when the chips are to be tightly packed together, as I understand the patent, the current state of the art appears to be silicon oxide; so I assume something like oxygen (perhaps also with silane in it or something?) is flowed over the reconstituted wafer and at the edges between the chips to form silicon oxide which somewhat bonds them together?

This is considered sub-optimal in various ways, being less robust to subsequent high temperature processing, and having a somewhat different coefficient of thermal expansion compared to bulk silicon.
And so the patent describes ways in which silicon can be laid down (via various schemes like CVD). This silicon fill is, of course, not going to form a perfect crystalline bond with the chips, and so it will present a scattering interface for the purposes of electrical conduction; but that's fine because it matches for the purposes of mechanical strength and thermal expansion.

I strongly suspect this is what the M1 Ultra is using. People have made all sorts of guesses as to whether it is using a standard interposer (like AMD) or a small silicon bridge interposer (like Intel's EMIB), but the performance of the UltraFusion connection seems substantially beyond what either of those have achieved.
If this is what the M1 Ultra is using, it's extremely good new for Apple users! As I said the technology is remarkably scalable, to basically as large a package as you want, and it's not intrinsically expensive. Of course you pay for the cost of each known good die, and you pay the cost of the reconstructed wafer and second round of BEOL, but these are both not especially expensive (the joy of using technology that has already been proved across billions of cell phone chips, as opposed to the specialist technologies of AMD and Intel...)

If I might venture cautiously into an area I know very little about, I think this patent also clarifies the practical distinction between TSMC's InFO family and the CoWoS family.

To my eye it looks like the salient detail is that InFO begins with the synthetic wafer, on which is constructed a secondary RDL based on BEOL processing. This limits the "added layer" to RDL (metal wires) functionality.
Conversely CoWoS begins with a wafer which you construct as you like, with capacitors, logic, wiring, whatever. This forms the base layer, on top of which are placed a layer of additional chips. So, in principle, you could form the base layer as a pure RDL metal layer, stick the known good chips on top, and having something very similar to the outcome I've described above.
On the plus side CoWoS gives you a second layer that doesn't have to be pure RDL, it can also have logic and whatever else you want.
But on the minus side I can see a few issues. One is that the mechanical stiffening that can be provided by the InFO scheme (whether by molding, by oxide, or by silicon fill) is absent, making strength and thermal expansion more of a concern. You could try to add the filling (molding compound, oxide, or silicon) but doing that once the chips are already bonded to the underlying wafer might be problematic? So much of the processing details seem determined by what chemicals and what temperatures can be handled by the components at any given stage.
Secondly the scheme is more expensive because you're essentially constructing each package from two wafers and two full passes through a fab, whereas the InFo scheme requires only one wafer and the second pass uses only the BEOL of a (possibly trailing edge) fab.
 Next Post in Thread >
TopicPosted ByDate
manufacturing/packaging of the M1 Ultra---2022/04/10 11:26 AM
  manufacturing/packaging of the M1 UltraRayla2022/04/10 12:23 PM
Reply to this Topic
Body: No Text
How do you spell avocado?