Travis Downs ( on July 12, 2020 8:02 pm wrote:
> anon2 ( on July 12, 2020 7:01 pm wrote:
> > David Kanter ( on July 12, 2020 6:13 pm wrote:
> > > I did some analysis a while back that might useful to share here.
> > >
> > > 8.0 mm2 SKL core
> > > 0.9 mm2 AVX512
> > > 2.0 mm2 1MB L2$
> > > 2.4 mm2 1.375MB L3$
> > > 0.4 mm2 Snoop filter
> > > 2.0 mm2 caching and home agent
> > > 2.2 mm2 FIVR, PLLs
> > > 0.4 mm2 Mystery block
> > >
> > > 18.0 mm2 Total SKL-SP tile
> > >
> > > So AVX512 is about 5% of the tile area, the tiles are 72% of the total area of SKL-SP.
> > >
> > > If you removed AVX512 you'd save 28mm2 for the whole chip, which would let you add at most 2 tiles.
> > >
> > > In that vein, it seems like a pretty reasonable trade-off.
> >
> > I won't say any processor design choice is not reasonable because we don't know what the constraints are,
> > even apparently stupid marketing segregation that is done with very reasonable goal of increasing profit!
> >
> > That said, how do you separate core (presumably including AVX256) from AVX512? Is AVX512 part
> > of the core 8mm, or additional? Do you just roughly chop vector units and registers in half?
> Most of the vector unit is shared between AVX-512 and AVX2, the main thing that AVX-512 has extra
> is the extra 512 bits of FMA (hanging off the top of the core complex in the uncore, so very
> obvious), and a small amount of stuff in the main vector EUs (you can see this when you compare
> the SKL and SKX vector EUs: there is a missing bit in SKL and everything is shifted down) so
> the top of the FMA doesn't line up with the edge of the core like it does in SKX part.
> >
> > Putting 512 bit data paths through the L1d to vector units is at least one thing in the core which is not
> > a simple bolt-on
> Yes, that one is harder to measure. As above the SKL and SKL core complexes are exactly
> the same size and almost exactly the same layout, so if you believe SKL is an efficient
> design for 256-bit AVX2, then the area cost was not large (in SKL there is some masked
> out area in the load-store area, which is used in SKX, but it is still fairly small).
> Is it plausible that the main AVX-512 data paths (between the reg file and EUs, and between the load-store
> unit and everything else) are on one or more additional layer(s)? So the cost of SKX over SKL is largely
> in those additional layers, not in the area of the core complex? The way the SIMD reg files are laid
> out is somewhat suggestive of that: 256-bit wide, and then the other 256 bits for zmms below.

No, I doubt it, in that the "datapath" ie the logic gates isn't in metal layers. There's a small chance that they placed gates very sparsely in the 256bit datapaths, and then filled in
the empty space with gates in the 512bit datapath, but the connectivity of the lowest/densest metal layers is sufficiently challenged that I doubt that's probable. The first few metal layers are generally reserved for intra-cell connections, and you really don't want those wires traveling very far because of RC. So I'd doubt you'd want to pay the tax on the 256bit datapath that you'd incur for something like this.
