By: Mark Roulo (nothanks.delete@this.xxx.com), January 30, 2013 3:37 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on January 30, 2013 3:08 pm wrote:
> Mark Roulo (nothanks.delete@this.xxx.com) on January 30, 2013 2:14 pm wrote:
> > Paul A. Clayton (paaronclayton.delete@this.gmail.com) on January 30, 2013 1:42 pm wrote:
> >
> > > (I am curious how they managed to disable portions of the cache. Although some high-end servers
> > > support selective block disabling as a standard feature, I was not aware of any that supporting
> > > disabling half, a third, or a sixth of the cache. Does Facebook have access to special features
> > > that are typically fused-off? Restricting such to Facebook seems to be a disservice to others--e.g.,
> > > academic researchers--who could benefit from access to such features.)
> >
> > Facebook may have done this test using ARM CPUs.
> >
> > Google for "Lockdown by line" and ARM.
>
> Way and line locking are fairly common features of embedded processors. However, such locking reduces
> associativity. A nice complementary feature is partial allocation of cache memory to a memory address
> range where associativity is not reduced (IIRC, this was provided by some of the Motorola PowerPC74xx
> series. I think those CPUs adjusted the line size [so all cache tags were used], but I do not think
> there is any technical constraint requiring that as part of a CPU design. Those were the days of
> off-chip SRAM caches.). (Such a scratchpad memory would avoid tag overhead--energy and latency--as
> well as having the guaranteed access traits of locked cache lines or ways.)
>
> (Another odd variation in tag and data use would be associating a region of the cache with a
> small subset of memory--such that tags could be substantially smaller. It would also be conceivable
> to have page-granular cache lines for part of a cache to facilitate something close to a virtualizable
> scratchpad memory. But these concepts are probably quite a bit too wacky.)
>
> A six-fold reduction in associativity (with a capacity reduction from 3MiB to 512KiB)
> would seem to be a significant matter even if starting from 16-way set associative. That
> would seem to make the measurement more problematic. (The degrading at 512KiB capacity
> might have been a result of increased conflict misses more than capacity misses.)
Another way of looking at this is that even *WITH* loss of associativity, they didn't need more than 512KB of cache. Think of the 512KB as an upper limit. Knowing that you can drop from 3MB to 512KB (and maybe lower!) is quite useful even if you can actually go below that ...
> Mark Roulo (nothanks.delete@this.xxx.com) on January 30, 2013 2:14 pm wrote:
> > Paul A. Clayton (paaronclayton.delete@this.gmail.com) on January 30, 2013 1:42 pm wrote:
> >
> > > (I am curious how they managed to disable portions of the cache. Although some high-end servers
> > > support selective block disabling as a standard feature, I was not aware of any that supporting
> > > disabling half, a third, or a sixth of the cache. Does Facebook have access to special features
> > > that are typically fused-off? Restricting such to Facebook seems to be a disservice to others--e.g.,
> > > academic researchers--who could benefit from access to such features.)
> >
> > Facebook may have done this test using ARM CPUs.
> >
> > Google for "Lockdown by line" and ARM.
>
> Way and line locking are fairly common features of embedded processors. However, such locking reduces
> associativity. A nice complementary feature is partial allocation of cache memory to a memory address
> range where associativity is not reduced (IIRC, this was provided by some of the Motorola PowerPC74xx
> series. I think those CPUs adjusted the line size [so all cache tags were used], but I do not think
> there is any technical constraint requiring that as part of a CPU design. Those were the days of
> off-chip SRAM caches.). (Such a scratchpad memory would avoid tag overhead--energy and latency--as
> well as having the guaranteed access traits of locked cache lines or ways.)
>
> (Another odd variation in tag and data use would be associating a region of the cache with a
> small subset of memory--such that tags could be substantially smaller. It would also be conceivable
> to have page-granular cache lines for part of a cache to facilitate something close to a virtualizable
> scratchpad memory. But these concepts are probably quite a bit too wacky.)
>
> A six-fold reduction in associativity (with a capacity reduction from 3MiB to 512KiB)
> would seem to be a significant matter even if starting from 16-way set associative. That
> would seem to make the measurement more problematic. (The degrading at 512KiB capacity
> might have been a result of increased conflict misses more than capacity misses.)
Another way of looking at this is that even *WITH* loss of associativity, they didn't need more than 512KB of cache. Think of the 512KB as an upper limit. Knowing that you can drop from 3MB to 512KB (and maybe lower!) is quite useful even if you can actually go below that ...