By: Linus Torvalds (torvalds.delete@this.osdl.org), August 16, 2005 11:48 am
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 8/16/05 wrote:
>
>It sounds like you really prefer highly associative caches.
I definitely do. With some nice FP loops you may be able
to work around direct-mapped caches, and argue that you
can make the cache sufficiently faster that it's worth
the hoops the simpler/faster hardware makes you jump
through. I suspect that embedded people might have the
same argument.
With general-purpose programming, the pain is just too
big. You get a lot of cache misses due to way contention.
There's tons of data on this. If you want spec D$ miss
rates, see for example
http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/new_tables/specint_64-amean.tab
which says that for D$ there's about 25% more misses
for a direct-mapped cache than for a two-way one in the
L1 cache size range (the difference is even bigger for I$,
but the miss numbers there are smaller, of course).
And that's ignoring the worst case - that's just average.
So I personally would want at least 4-way in the L1, and
as much as possible in the L2. And if full associativity
doesn't work out, then some mixing in of other bits to hash
the thing around to avoid common alignment-induced "hot
ways", that sounds like a good idea to me (people seem to
call it "pseudo-associative").
One of the things I personally like about highly associative
caches is the graceful degradation. I'd much rather
have a system that tends to slow down more gracefully than
fall of a steep cliff ("glass jaw") when something bad
happens. A direct-mapped cache basically is asking for
trouble - it may perform fine "on average", but then it has
nasty situations where it really sucks.
Me, I'll take "consistently good" over "really really good
if all the planets align correctly" any day. When it comes
to caches, that means that I'd much rather take a two-
cycle L1 that is big and has high associativity over a
single-cycle small one. Even if the single-cycle one then
runs like a bat out of hell when things go the rigt way.
I guess this is all the same argument that make me prefer
a P-M over a P4. "Plodding and dependable workhorse" is
better than a sprinter that hits a brick wall every once in
a while.
Linus
>
>It sounds like you really prefer highly associative caches.
I definitely do. With some nice FP loops you may be able
to work around direct-mapped caches, and argue that you
can make the cache sufficiently faster that it's worth
the hoops the simpler/faster hardware makes you jump
through. I suspect that embedded people might have the
same argument.
With general-purpose programming, the pain is just too
big. You get a lot of cache misses due to way contention.
There's tons of data on this. If you want spec D$ miss
rates, see for example
http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data/new_tables/specint_64-amean.tab
which says that for D$ there's about 25% more misses
for a direct-mapped cache than for a two-way one in the
L1 cache size range (the difference is even bigger for I$,
but the miss numbers there are smaller, of course).
And that's ignoring the worst case - that's just average.
So I personally would want at least 4-way in the L1, and
as much as possible in the L2. And if full associativity
doesn't work out, then some mixing in of other bits to hash
the thing around to avoid common alignment-induced "hot
ways", that sounds like a good idea to me (people seem to
call it "pseudo-associative").
One of the things I personally like about highly associative
caches is the graceful degradation. I'd much rather
have a system that tends to slow down more gracefully than
fall of a steep cliff ("glass jaw") when something bad
happens. A direct-mapped cache basically is asking for
trouble - it may perform fine "on average", but then it has
nasty situations where it really sucks.
Me, I'll take "consistently good" over "really really good
if all the planets align correctly" any day. When it comes
to caches, that means that I'd much rather take a two-
cycle L1 that is big and has high associativity over a
single-cycle small one. Even if the single-cycle one then
runs like a bat out of hell when things go the rigt way.
I guess this is all the same argument that make me prefer
a P-M over a P4. "Plodding and dependable workhorse" is
better than a sprinter that hits a brick wall every once in
a while.
Linus
Topic | Posted By | Date |
---|---|---|
It's official - Hitachi's chipset not overly impressive | Paul DeMone | 2005/08/11 11:58 AM |
It's official - Hitachi's chipset not overly impre | pigdog | 2005/08/11 02:05 PM |
chipset vs OS/compiler | Paul DeMone | 2005/08/12 12:06 PM |
chipset vs OS/compiler | Anonymous Donkey | 2005/08/12 06:25 PM |
chipset vs OS/compiler | Andi Kleen | 2005/08/13 06:03 AM |
chipset vs OS/compiler | Paul DeMone | 2005/08/13 06:55 AM |
chipset vs OS/compiler | who cares | 2005/08/13 07:05 PM |
chipset vs OS/compiler | Paul DeMone | 2005/08/13 10:16 PM |
chipset vs OS/compiler | Manfred | 2005/08/14 05:16 AM |
chipset vs OS/compiler | Paul DeMone | 2005/08/14 06:40 AM |
chipset vs OS/compiler | Linus Torvalds | 2005/08/14 07:48 AM |
chipset vs OS/compiler | Paul DeMone | 2005/08/14 09:48 AM |
chipset vs OS/compiler | Linus Torvalds | 2005/08/14 10:07 AM |
chipset vs OS/compiler | Anonymous Donkey | 2005/08/14 05:10 PM |
chipset vs OS/compiler | Anonymous Frog | 2005/08/14 07:29 PM |
chipset vs OS/compiler | Anonymous Donkey | 2005/08/14 08:36 PM |
LLVM | Rob Thorpe | 2005/08/16 11:37 AM |
32/64 bit compilation | Rob Thorpe | 2005/08/16 11:12 AM |
linux malloc | Ricardo Bugalho | 2005/08/14 10:42 AM |
linux malloc | Linus Torvalds | 2005/08/14 03:10 PM |
linux malloc | Ricardo Bugalho | 2005/08/14 04:15 PM |
linux malloc | Andi Kleen | 2005/08/15 02:37 AM |
linux malloc | Anonymous | 2005/08/15 05:37 PM |
linux malloc | Andi Kleen | 2005/08/16 12:38 AM |
chipset vs OS/compiler | who cares | 2005/08/14 12:12 PM |
chipset vs OS/compiler | S. Rao | 2005/08/15 10:36 AM |
chipset vs OS/compiler | Arun Ramakrishnan | 2005/08/15 02:42 PM |
It's official - Hitachi's chipset not overly impre | Chuck | 2005/08/12 12:18 PM |
I believe ... | leonov | 2005/08/12 12:26 PM |
I believe ... | Andi Kleen | 2005/08/13 06:07 AM |
I believe ... | Paul DeMone | 2005/08/13 07:41 AM |
I believe ... | Andi Kleen | 2005/08/13 02:08 PM |
I believe ... | Paul DeMone | 2005/08/13 02:29 PM |
I believe ... | David Kanter | 2005/08/13 08:55 PM |
I believe ... | Andi Kleen | 2005/08/14 05:50 AM |
I believe ... | Andi Kleen | 2005/08/14 05:47 AM |
Children are the future. | john evans | 2005/08/13 06:30 PM |
Children are the future. | Andi Kleen | 2005/08/14 07:29 AM |
Changes to Pathscale | David Kanter | 2005/08/14 09:53 AM |
Changes to Pathscale | Andi Kleen | 2005/08/14 12:06 PM |
Changes to Pathscale | Michael_S | 2005/08/15 03:02 AM |
Changes to Pathscale | Andi Kleen | 2005/08/15 03:37 PM |
Changes to Pathscale | Linus Torvalds | 2005/08/15 06:18 PM |
Changes to Pathscale | john evans | 2005/08/15 09:49 PM |
Changes to Pathscale | Linus Torvalds | 2005/08/16 08:28 AM |
Cache associativity | David Kanter | 2005/08/16 09:52 AM |
Cache associativity | Linus Torvalds | 2005/08/16 11:48 AM |
Cache associativity | David Kanter | 2005/08/16 12:14 PM |
Cache associativity and virtualization | slim | 2005/08/16 08:39 PM |
Cache associativity and virtualization | David Kanter | 2005/08/16 08:50 PM |
Cache associativity and virtualization | rwessel | 2005/08/16 10:27 PM |
Changes to Pathscale | john evans | 2005/08/16 09:52 PM |
Oh, one more thing. | john evans | 2005/08/16 10:21 PM |
Sharing is tough! | David Kanter | 2005/08/17 07:21 AM |
Sharing is tough! | john evans | 2005/08/17 08:24 PM |
Changes to Pathscale | Michael S | 2005/08/16 01:33 AM |
Opteron load reordering | IlleglWpns | 2005/08/16 10:30 PM |
Opteron load reordering | Anonymous | 2005/08/17 04:36 PM |
Opteron load reordering | IlleglWpns | 2005/08/17 05:00 PM |
Opteron load reordering | Anonymous | 2005/08/18 01:12 AM |
Opteron load reordering | IlleglWpns | 2005/08/18 01:37 AM |
A compiler for Opteron | Rob Thorpe | 2005/08/16 11:52 AM |
It's official - Hitachi's chipset not overly impressive | José Javier zarate | 2005/08/23 08:51 AM |