Skylake-SP area breakdown

By: Travis Downs (, July 12, 2020 8:02 pm
Room: Moderated Discussions
anon2 ( on July 12, 2020 7:01 pm wrote:
> David Kanter ( on July 12, 2020 6:13 pm wrote:
> > I did some analysis a while back that might useful to share here.
> >
> > 8.0 mm2 SKL core
> > 0.9 mm2 AVX512
> > 2.0 mm2 1MB L2$
> > 2.4 mm2 1.375MB L3$
> > 0.4 mm2 Snoop filter
> > 2.0 mm2 caching and home agent
> > 2.2 mm2 FIVR, PLLs
> > 0.4 mm2 Mystery block
> >
> > 18.0 mm2 Total SKL-SP tile
> >
> > So AVX512 is about 5% of the tile area, the tiles are 72% of the total area of SKL-SP.
> >
> > If you removed AVX512 you'd save 28mm2 for the whole chip, which would let you add at most 2 tiles.
> >
> > In that vein, it seems like a pretty reasonable trade-off.
> I won't say any processor design choice is not reasonable because we don't know what the constraints are,
> even apparently stupid marketing segregation that is done with very reasonable goal of increasing profit!
> That said, how do you separate core (presumably including AVX256) from AVX512? Is AVX512 part
> of the core 8mm, or additional? Do you just roughly chop vector units and registers in half?

Most of the vector unit is shared between AVX-512 and AVX2, the main thing that AVX-512 has extra is the extra 512 bits of FMA (hanging off the top of the core complex in the uncore, so very obvious), and a small amount of stuff in the main vector EUs (you can see this when you compare the SKL and SKX vector EUs: there is a missing bit in SKL and everything is shifted down) so the top of the FMA doesn't line up with the edge of the core like it does in SKX part.

> Putting 512 bit data paths through the L1d to vector units is at least one thing in the core which is not
> a simple bolt-on

Yes, that one is harder to measure. As above the SKL and SKL core complexes are exactly the same size and almost exactly the same layout, so if you believe SKL is an efficient design for 256-bit AVX2, then the area cost was not large (in SKL there is some masked out area in the load-store area, which is used in SKX, but it is still fairly small).

Is it plausible that the main AVX-512 data paths (between the reg file and EUs, and between the load-store unit and everything else) are on one or more additional layer(s)? So the cost of SKX over SKL is largely in those additional layers, not in the area of the core complex? The way the SIMD reg files are laid out is somewhat suggestive of that: 256-bit wide, and then the other 256 bits for zmms below.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Skylake-SP area breakdownDavid Kanter2020/07/12 06:13 PM
  Skylake-SP area breakdownanon22020/07/12 07:01 PM
    Skylake-SP area breakdownTravis Downs2020/07/12 08:02 PM
      Skylake-SP area breakdownanon2020/07/12 08:44 PM
  Skylake-SP area breakdownTravis Downs2020/07/12 08:03 PM
    Skylake-SP area breakdownDavid Kanter2020/07/12 08:20 PM
      To elaborateDavid Kanter2020/07/12 08:22 PM
        To elaborateTravis Downs2020/07/13 07:03 AM
          To elaborateAnon2020/07/13 07:36 AM
            To elaborateAdrian2020/07/13 01:45 PM
              To elaborateAnon2020/07/13 02:06 PM
                To elaborateChester2020/07/13 08:30 PM
  Alternatives ImplementationsKyle Siefring2020/07/13 06:02 PM
    Alternatives ImplementationsTravis Downs2020/07/13 08:41 PM
    Alternatives ImplementationsMaynard Handley2020/07/13 10:37 PM
      Alternatives ImplementationsDoug S2020/07/13 11:25 PM
        Mask costsDavid Kanter2020/07/14 08:13 AM
        Alternatives Implementationstarlinian2020/07/14 08:22 AM
          Alternatives ImplementationsDoug S2020/07/14 10:03 AM
          Alternatives ImplementationsMaynard Handley2020/07/14 10:12 AM
        Alternatives ImplementationsMaynard Handley2020/07/14 10:10 AM
          Alternatives ImplementationsDoug S2020/07/14 10:47 AM
            Alternatives ImplementationsBrett2020/07/14 01:38 PM
            Alternatives Implementationstarlinian2020/07/14 02:30 PM
Reply to this Topic
Body: No Text
How do you spell avocado?