By: Patrick Chase (patrickjchase.delete@this.gmail.com), July 6, 2013 12:08 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on July 6, 2013 10:57 am wrote:
> Patrick Chase (patrickjchase.delete@this.gmail.com) on July 5, 2013 11:37 am wrote:
> > This depends on the allocation policies of the L1 and L2 caches. Many modern processors default
> > to "allocate on read miss" (or simply "read-allocate") for either L1 or both, which means that a
> > cache line will only be allocated if a *load* misses the cache. You've specified a store above,
> > so in such a core there would be no changes to the cache contents. The reasoning behind the read-allocate
> > policy is that many workloads involve streaming write-only data (no temporal locality, entire cache
> > line will be over-written). Loading the old version of such data from memory or evicting other data
> > from cache are both counterproductive, so you ideally want it to bypass cache.
> >
>
> Huh?
> Show me not "many", but just one modern general-purpose processor with write-back
> cache that does not write-allocate by default. AFAIK, there are none.
> Streaming stores are another matter.
Cortex A8, along with several older and/or lower-end ARM cores. See section 7.3.3 of the TRM. Cortex A8 does support write-allocate at L2, but enabling it has fairly nasty impacts on write bandwidth.
Modern Intel cores have cache-line-sized write-combining buffers between the core and the L1, such that the core doesn't have to read the old version if the entire line is over-written (for example in a streaming workload). This avoids the write bandwidth issue referenced above.
> Patrick Chase (patrickjchase.delete@this.gmail.com) on July 5, 2013 11:37 am wrote:
> > This depends on the allocation policies of the L1 and L2 caches. Many modern processors default
> > to "allocate on read miss" (or simply "read-allocate") for either L1 or both, which means that a
> > cache line will only be allocated if a *load* misses the cache. You've specified a store above,
> > so in such a core there would be no changes to the cache contents. The reasoning behind the read-allocate
> > policy is that many workloads involve streaming write-only data (no temporal locality, entire cache
> > line will be over-written). Loading the old version of such data from memory or evicting other data
> > from cache are both counterproductive, so you ideally want it to bypass cache.
> >
>
> Huh?
> Show me not "many", but just one modern general-purpose processor with write-back
> cache that does not write-allocate by default. AFAIK, there are none.
> Streaming stores are another matter.
Cortex A8, along with several older and/or lower-end ARM cores. See section 7.3.3 of the TRM. Cortex A8 does support write-allocate at L2, but enabling it has fairly nasty impacts on write bandwidth.
Modern Intel cores have cache-line-sized write-combining buffers between the core and the L1, such that the core doesn't have to read the old version if the entire line is over-written (for example in a streaming workload). This avoids the write bandwidth issue referenced above.