By: Patrick Chase (patrickjchase.delete@this.gmail.com), July 6, 2013 2:15 pm
Room: Moderated Discussions
Sorry about the repeat reply, but...
Michael S (already5chosen.delete@this.yahoo.com) on July 6, 2013 10:57 am wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on July 5, 2013 11:37 am wrote:
> > This depends on the allocation policies of the L1 and L2 caches. Many modern processors default
> > to "allocate on read miss" (or simply "read-allocate") for either L1 or both, which means that a
> > cache line will only be allocated if a *load* misses the cache. You've specified a store above,
> > so in such a core there would be no changes to the cache contents. The reasoning behind the read-allocate
> > policy is that many workloads involve streaming write-only data (no temporal locality, entire cache
> > line will be over-written). Loading the old version of such data from memory or evicting other data
> > from cache are both counterproductive, so you ideally want it to bypass cache.
> >
>
> Huh?
> Show me not "many", but just one modern general-purpose processor with write-back
> cache that does not write-allocate by default. AFAIK, there are none.
> Streaming stores are another matter.
I said "modern processors", not "modern general-purpose processors". You really should not put words into peoples' mouths, as that's rather poor/rude forum etiquette (though in this case you were technically wrong even with your modification, thanks to the A8).
The reason why I chose the more general wording to begin with is because write-back, read-allocate caches are used in both lightweight embedded cores and in DSP-ish cores like TI C6X and ST2xx. The rationale behind that design choice is that a read-allocate, write-back cache will "capture" writes to the stack and similar read/write structures, but without penalizing streaming writes (for example of image data) with the overhead of line fills.
Michael S (already5chosen.delete@this.yahoo.com) on July 6, 2013 10:57 am wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on July 5, 2013 11:37 am wrote:
> > This depends on the allocation policies of the L1 and L2 caches. Many modern processors default
> > to "allocate on read miss" (or simply "read-allocate") for either L1 or both, which means that a
> > cache line will only be allocated if a *load* misses the cache. You've specified a store above,
> > so in such a core there would be no changes to the cache contents. The reasoning behind the read-allocate
> > policy is that many workloads involve streaming write-only data (no temporal locality, entire cache
> > line will be over-written). Loading the old version of such data from memory or evicting other data
> > from cache are both counterproductive, so you ideally want it to bypass cache.
> >
>
> Huh?
> Show me not "many", but just one modern general-purpose processor with write-back
> cache that does not write-allocate by default. AFAIK, there are none.
> Streaming stores are another matter.
I said "modern processors", not "modern general-purpose processors". You really should not put words into peoples' mouths, as that's rather poor/rude forum etiquette (though in this case you were technically wrong even with your modification, thanks to the A8).
The reason why I chose the more general wording to begin with is because write-back, read-allocate caches are used in both lightweight embedded cores and in DSP-ish cores like TI C6X and ST2xx. The rationale behind that design choice is that a read-allocate, write-back cache will "capture" writes to the stack and similar read/write structures, but without penalizing streaming writes (for example of image data) with the overhead of line fills.