By: Ungo (a.delete@this.b.c.d.e), July 9, 2013 3:24 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on July 6, 2013 10:57 am wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on July 5, 2013 11:37 am wrote:
> > The line sizes for all levels are typically the same. If they weren't then coherency and OS cache
> > management would become more complicated. With that said, a cache that has valid/dirty bits for
> > partial lines is called a sectored cache (a very old technique first used in the 360/85). The
> > only recent microarchticture that I know of that uses a sectored cache is the NVIDIA Fermi GPU
> > family, which appear to use a sectored L2 with 128-byte lines and 32-byte sectors [*].
> >
> > -- Patrick
>
> Crystallwell L4 should do something similar. It's not yet documented in the Intel optimization
> reference manual, but anything else simply does not make a technical sense.
You're referring to the discussion a few weeks back about Intel possibly reusing 2MB of the L3 SRAM for tags, which would force 128-byte lines and a 2-stage lookup? I thought that sounded reasonable, and still do, but I recently convinced myself that it can't be what's actually going on. Consider die sizes:
AnandTech's Iris Pro 5200 review
Haswell GT3e quadcore, 6MB L3: 264mm^2 (Crystalwell)
Haswell GT2 quadcore, up to 8MB L3: 177mm^2
Haswell GT3 dualcore, up to 4MB L3: 181mm^2
The GT3(e) GPU occupies about twice the die area as GT2. Operating on the assumption that the GT3e die really has an 8MB L3 array with 2MB repurposed for tag, you'd guess that the GT3 GPU is therefore 2*(264-177) = 174mm^2, a calculation Anand makes. But he didn't notice that this is absurd: that's just 7mm^2 less than the entire GT3 dualcore die. So clearly there's something adding substantial die area to Crystalwell other than the GPU.
In fact, if we do some crude photogrammetry on this die shot of GT3 dualcore, about 58% of the die is GPU, or 104mm^2. 177+(104/2) = 229mm^2, leaving us with about 35mm^2 unaccounted for in Crystalwell. In this shot of the 177mm^2 GT2 quadcore, the 8MB L3 cache appears to take up about 15 to 16% of the die, so about 28mm^2. What if, rather than reusing 2MB worth of L3 as L4 tag, Intel has actually downsized it to 21mm^2, and used the ~42mm^2 of area growth to build dedicated tags for the 128MB L4, possibly with 64-byte lines? (No doubt some of that area is used on L4 I/O and control logic, but it seems like there's more than enough area for fine grained L4 tags.)
The other thing which suggests Intel isn't building a die that can repurpose part of the L3 array is simply that as of now, they have not announced a single 264mm^2 Haswell quadcore with 8MB L3 and without Crystalwell. That die appears to be for Crystalwell only.
> Patrick Chase (patrickjchase.delete@this.gmail.com) on July 5, 2013 11:37 am wrote:
> > The line sizes for all levels are typically the same. If they weren't then coherency and OS cache
> > management would become more complicated. With that said, a cache that has valid/dirty bits for
> > partial lines is called a sectored cache (a very old technique first used in the 360/85). The
> > only recent microarchticture that I know of that uses a sectored cache is the NVIDIA Fermi GPU
> > family, which appear to use a sectored L2 with 128-byte lines and 32-byte sectors [*].
> >
> > -- Patrick
>
> Crystallwell L4 should do something similar. It's not yet documented in the Intel optimization
> reference manual, but anything else simply does not make a technical sense.
You're referring to the discussion a few weeks back about Intel possibly reusing 2MB of the L3 SRAM for tags, which would force 128-byte lines and a 2-stage lookup? I thought that sounded reasonable, and still do, but I recently convinced myself that it can't be what's actually going on. Consider die sizes:
AnandTech's Iris Pro 5200 review
Haswell GT3e quadcore, 6MB L3: 264mm^2 (Crystalwell)
Haswell GT2 quadcore, up to 8MB L3: 177mm^2
Haswell GT3 dualcore, up to 4MB L3: 181mm^2
The GT3(e) GPU occupies about twice the die area as GT2. Operating on the assumption that the GT3e die really has an 8MB L3 array with 2MB repurposed for tag, you'd guess that the GT3 GPU is therefore 2*(264-177) = 174mm^2, a calculation Anand makes. But he didn't notice that this is absurd: that's just 7mm^2 less than the entire GT3 dualcore die. So clearly there's something adding substantial die area to Crystalwell other than the GPU.
In fact, if we do some crude photogrammetry on this die shot of GT3 dualcore, about 58% of the die is GPU, or 104mm^2. 177+(104/2) = 229mm^2, leaving us with about 35mm^2 unaccounted for in Crystalwell. In this shot of the 177mm^2 GT2 quadcore, the 8MB L3 cache appears to take up about 15 to 16% of the die, so about 28mm^2. What if, rather than reusing 2MB worth of L3 as L4 tag, Intel has actually downsized it to 21mm^2, and used the ~42mm^2 of area growth to build dedicated tags for the 128MB L4, possibly with 64-byte lines? (No doubt some of that area is used on L4 I/O and control logic, but it seems like there's more than enough area for fine grained L4 tags.)
The other thing which suggests Intel isn't building a die that can repurpose part of the L3 array is simply that as of now, they have not announced a single 264mm^2 Haswell quadcore with 8MB L3 and without Crystalwell. That die appears to be for Crystalwell only.