By: Jörn Engel (joern.delete@this.purestorage.com), October 4, 2021 5:52 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
.
>
> So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> way the data will be close when it is accessed. Even if it's accessed just for writing the actual new data on
> top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
An important example you missed is hash tables and various structures that get accessed in unpredictable enough patterns that they basically behave the same. With a big memset, you essentially pay for one cache miss. After that the prefetcher can trivially detect the pattern and do a streaming load of all cachelines. If you access lines in random order, you can create long chains of dependent cache misses.
Basically, without the memset you would still want to prefetch the entire structure to get good performance. With memset you either get to clear memory for free or get the prefetch for free, which ever way you look at it.
.
>
> So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> way the data will be close when it is accessed. Even if it's accessed just for writing the actual new data on
> top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
An important example you missed is hash tables and various structures that get accessed in unpredictable enough patterns that they basically behave the same. With a big memset, you essentially pay for one cache miss. After that the prefetcher can trivially detect the pattern and do a streaming load of all cachelines. If you access lines in random order, you can create long chains of dependent cache misses.
Basically, without the memset you would still want to prefetch the entire structure to get good performance. With memset you either get to clear memory for free or get the prefetch for free, which ever way you look at it.