By: Patrick Chase (patrickjchase.delete@this.gmail.com), January 30, 2013 7:25 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on January 30, 2013 1:42 pm wrote:
> (I am curious how they managed to disable portions of the cache. Although some high-end servers
> support selective block disabling as a standard feature, I was not aware of any that supporting
> disabling half, a third, or a sixth of the cache. Does Facebook have access to special features
> that are typically fused-off? Restricting such to Facebook seems to be a disservice to others--e.g.,
> academic researchers--who could benefit from access to such features.)
Most people in academia and industry who do this sort of thing use cycle-accurate simulators. I know for a fact that ARM and Intel both produce such beasts for all of their cores. ARM basically gives theirs away to custmers, and as for Intel I strongly suspect that Vtune uses cycle-accurate simulators to "interpolate" between trace catpures. Some of ARM's cores have instantiation options for I$ and D$ size and layouts, and if I recall correctly the corresponding simulators are similarly configurable. When I was doing CPU selection for SoCs that's precisely how we did cache sizing...
The simulator is actually preferred, because you can precisely control things like bus/memory latencies and posted transaction counts. That makes them inherently more repeatable (and more representative, if you know what you're doing) than running on, say, an eval board.
Best rgds,
Patrick
> (I am curious how they managed to disable portions of the cache. Although some high-end servers
> support selective block disabling as a standard feature, I was not aware of any that supporting
> disabling half, a third, or a sixth of the cache. Does Facebook have access to special features
> that are typically fused-off? Restricting such to Facebook seems to be a disservice to others--e.g.,
> academic researchers--who could benefit from access to such features.)
Most people in academia and industry who do this sort of thing use cycle-accurate simulators. I know for a fact that ARM and Intel both produce such beasts for all of their cores. ARM basically gives theirs away to custmers, and as for Intel I strongly suspect that Vtune uses cycle-accurate simulators to "interpolate" between trace catpures. Some of ARM's cores have instantiation options for I$ and D$ size and layouts, and if I recall correctly the corresponding simulators are similarly configurable. When I was doing CPU selection for SoCs that's precisely how we did cache sizing...
The simulator is actually preferred, because you can precisely control things like bus/memory latencies and posted transaction counts. That makes them inherently more repeatable (and more representative, if you know what you're doing) than running on, say, an eval board.
Best rgds,
Patrick