By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), January 30, 2013 3:08 pm
Room: Moderated Discussions
Mark Roulo (nothanks.delete@this.xxx.com) on January 30, 2013 2:14 pm wrote:
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on January 30, 2013 1:42 pm wrote:
>
> > (I am curious how they managed to disable portions of the cache. Although some high-end servers
> > support selective block disabling as a standard feature, I was not aware of any that supporting
> > disabling half, a third, or a sixth of the cache. Does Facebook have access to special features
> > that are typically fused-off? Restricting such to Facebook seems to be a disservice to others--e.g.,
> > academic researchers--who could benefit from access to such features.)
>
> Facebook may have done this test using ARM CPUs.
>
> Google for "Lockdown by line" and ARM.
Way and line locking are fairly common features of embedded processors. However, such locking reduces associativity. A nice complementary feature is partial allocation of cache memory to a memory address range where associativity is not reduced (IIRC, this was provided by some of the Motorola PowerPC74xx series. I think those CPUs adjusted the line size [so all cache tags were used], but I do not think there is any technical constraint requiring that as part of a CPU design. Those were the days of off-chip SRAM caches.). (Such a scratchpad memory would avoid tag overhead--energy and latency--as well as having the guaranteed access traits of locked cache lines or ways.)
(Another odd variation in tag and data use would be associating a region of the cache with a small subset of memory--such that tags could be substantially smaller. It would also be conceivable to have page-granular cache lines for part of a cache to facilitate something close to a virtualizable scratchpad memory. But these concepts are probably quite a bit too wacky.)
A six-fold reduction in associativity (with a capacity reduction from 3MiB to 512KiB) would seem to be a significant matter even if starting from 16-way set associative. That would seem to make the measurement more problematic. (The degrading at 512KiB capacity might have been a result of increased conflict misses more than capacity misses.)
> Paul A. Clayton (paaronclayton.delete@this.gmail.com) on January 30, 2013 1:42 pm wrote:
>
> > (I am curious how they managed to disable portions of the cache. Although some high-end servers
> > support selective block disabling as a standard feature, I was not aware of any that supporting
> > disabling half, a third, or a sixth of the cache. Does Facebook have access to special features
> > that are typically fused-off? Restricting such to Facebook seems to be a disservice to others--e.g.,
> > academic researchers--who could benefit from access to such features.)
>
> Facebook may have done this test using ARM CPUs.
>
> Google for "Lockdown by line" and ARM.
Way and line locking are fairly common features of embedded processors. However, such locking reduces associativity. A nice complementary feature is partial allocation of cache memory to a memory address range where associativity is not reduced (IIRC, this was provided by some of the Motorola PowerPC74xx series. I think those CPUs adjusted the line size [so all cache tags were used], but I do not think there is any technical constraint requiring that as part of a CPU design. Those were the days of off-chip SRAM caches.). (Such a scratchpad memory would avoid tag overhead--energy and latency--as well as having the guaranteed access traits of locked cache lines or ways.)
(Another odd variation in tag and data use would be associating a region of the cache with a small subset of memory--such that tags could be substantially smaller. It would also be conceivable to have page-granular cache lines for part of a cache to facilitate something close to a virtualizable scratchpad memory. But these concepts are probably quite a bit too wacky.)
A six-fold reduction in associativity (with a capacity reduction from 3MiB to 512KiB) would seem to be a significant matter even if starting from 16-way set associative. That would seem to make the measurement more problematic. (The degrading at 512KiB capacity might have been a result of increased conflict misses more than capacity misses.)