By: anon2 (anon.delete@this.anon.com), October 3, 2021 7:58 pm
Room: Moderated Discussions
Mark Roulo (nothanks.delete@this.xxx.com) on October 3, 2021 12:41 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> > >
> > > Zeroing has room for optimization, both since you will often zero more than one page at a time and
> > > because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> > > of the cache.
> >
> > No you don't, actually.
> >
> > People have tried various pre-zeroing schemes over and over
> > and over again, and it's always been a loss in the end.
> >
> > Why? Caches work, and they grow over time. And basically every single time you zero
> > something, you are doing so because you're going to access much of the end result -
> > even if it's just to overwrite it with final data - in the not too distant future.
> >
> > Pre-zeroing and doing it at a DRAM or memory controller level is always going to be the wrong answer.
> > It's going to mean that when you access it, you're now going to take that very expensive cache miss.
> >
> > Yes, you can always find benchmarks where pre-zeroing is great, because you can
> > pick the benchmark where you have just the right working set size, and you can time
> > the memory operations to when they are most effective for that benchmark.
> >
> > And then on real loads it won't work at all. In fact, even on the benchmark it will be a loss on
> > other microarchitecures with bigger caches - so you're basically pessimising for the future.
> >
> > So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> > way the data will be close when it is accessed. Even if
> > it's accessed just for writing the actual new data on
> > top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> > will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
> >
> > So for big sparse arrays (or huge initial allocations), you may actually be much better
> > off allocating them with something like a "mmap()" interface for anonymous memory (pick
> > whatever non-unix equivalent), and just telling the system that you will need this much
> > memory, but then depend on demand-paging to zero the pages for you before use.
> >
> > Yes, you'll then take the page faults dynamically, but it might well end up
> > much better than pre-zeroing big buffers that you won't use for a while.
> >
> > As a rule of thumb, you never ever want to move memory accesses closer to DRAM,
> > unless you have been explicitly told "I don't want this data any more" (or you have
> > some really good detection of "this working set won't fit in any caches").
> >
> > DRAM is just too far away, and caches are too effective - and you
> > very seldom know how much cache you have on a software level.
> >
> > Side note: that detection of "this working set won't fit in any caches" may well be
> > about the CPU knowing the size of a memory copy or memory clear operation ahead of time,
> > and taking those kinds of very explicit hints into account. Which is just another reason
> > you should have memory copy support in hardware, and not do it in software.
> >
> > Linus
>
> How confident are you about this for languages such as Java? Especially
> Java running in a server context with sophisticated multi-threaded GC?
>
> This seems like the sort of environment where zero-ing the data to be allocated later might be
> a win, especially if the data could be zero-d when nothing else needed the DRAM bandwidth.
>
>
Why? Zeroing at the point of use means you can skip the DRAM step entirely. You save one store to DRAM, and possibly even one load from DRAM.
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> > >
> > > Zeroing has room for optimization, both since you will often zero more than one page at a time and
> > > because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> > > of the cache.
> >
> > No you don't, actually.
> >
> > People have tried various pre-zeroing schemes over and over
> > and over again, and it's always been a loss in the end.
> >
> > Why? Caches work, and they grow over time. And basically every single time you zero
> > something, you are doing so because you're going to access much of the end result -
> > even if it's just to overwrite it with final data - in the not too distant future.
> >
> > Pre-zeroing and doing it at a DRAM or memory controller level is always going to be the wrong answer.
> > It's going to mean that when you access it, you're now going to take that very expensive cache miss.
> >
> > Yes, you can always find benchmarks where pre-zeroing is great, because you can
> > pick the benchmark where you have just the right working set size, and you can time
> > the memory operations to when they are most effective for that benchmark.
> >
> > And then on real loads it won't work at all. In fact, even on the benchmark it will be a loss on
> > other microarchitecures with bigger caches - so you're basically pessimising for the future.
> >
> > So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> > way the data will be close when it is accessed. Even if
> > it's accessed just for writing the actual new data on
> > top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> > will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
> >
> > So for big sparse arrays (or huge initial allocations), you may actually be much better
> > off allocating them with something like a "mmap()" interface for anonymous memory (pick
> > whatever non-unix equivalent), and just telling the system that you will need this much
> > memory, but then depend on demand-paging to zero the pages for you before use.
> >
> > Yes, you'll then take the page faults dynamically, but it might well end up
> > much better than pre-zeroing big buffers that you won't use for a while.
> >
> > As a rule of thumb, you never ever want to move memory accesses closer to DRAM,
> > unless you have been explicitly told "I don't want this data any more" (or you have
> > some really good detection of "this working set won't fit in any caches").
> >
> > DRAM is just too far away, and caches are too effective - and you
> > very seldom know how much cache you have on a software level.
> >
> > Side note: that detection of "this working set won't fit in any caches" may well be
> > about the CPU knowing the size of a memory copy or memory clear operation ahead of time,
> > and taking those kinds of very explicit hints into account. Which is just another reason
> > you should have memory copy support in hardware, and not do it in software.
> >
> > Linus
>
> How confident are you about this for languages such as Java? Especially
> Java running in a server context with sophisticated multi-threaded GC?
>
> This seems like the sort of environment where zero-ing the data to be allocated later might be
> a win, especially if the data could be zero-d when nothing else needed the DRAM bandwidth.
>
>
Why? Zeroing at the point of use means you can skip the DRAM step entirely. You save one store to DRAM, and possibly even one load from DRAM.