By: anon2 (anon.delete@this.anon.com), October 4, 2021 2:23 pm
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on October 4, 2021 9:41 am wrote:
> anon2 (anon.delete@this.anon.com) on October 3, 2021 7:58 pm wrote:
> > Mark Roulo (nothanks.delete@this.xxx.com) on October 3, 2021 12:41 pm wrote:
> > > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > > > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> > > > >
> > > > > Zeroing has room for optimization, both since you will often zero more than one page at a time and
> > > > > because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> > > > > of the cache.
> > > >
> > > > No you don't, actually.
> > > >
> > > > People have tried various pre-zeroing schemes over and over
> > > > and over again, and it's always been a loss in the end.
> > > >
> > > > Why? Caches work, and they grow over time. And basically every single time you zero
> > > > something, you are doing so because you're going to access much of the end result -
> > > > even if it's just to overwrite it with final data - in the not too distant future.
> > > >
> > > > Pre-zeroing and doing it at a DRAM or memory controller level is always going to be the wrong answer.
> > > > It's going to mean that when you access it, you're now going to take that very expensive cache miss.
> > > >
> > > > Yes, you can always find benchmarks where pre-zeroing is great, because you can
> > > > pick the benchmark where you have just the right working set size, and you can time
> > > > the memory operations to when they are most effective for that benchmark.
> > > >
> > > > And then on real loads it won't work at all. In fact, even on the benchmark it will be a loss on
> > > > other microarchitecures with bigger caches - so you're basically pessimising for the future.
> > > >
> > > > So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> > > > way the data will be close when it is accessed. Even if
> > > > it's accessed just for writing the actual new data on
> > > > top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> > > > will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
> > > >
> > > > So for big sparse arrays (or huge initial allocations), you may actually be much better
> > > > off allocating them with something like a "mmap()" interface for anonymous memory (pick
> > > > whatever non-unix equivalent), and just telling the system that you will need this much
> > > > memory, but then depend on demand-paging to zero the pages for you before use.
> > > >
> > > > Yes, you'll then take the page faults dynamically, but it might well end up
> > > > much better than pre-zeroing big buffers that you won't use for a while.
> > > >
> > > > As a rule of thumb, you never ever want to move memory accesses closer to DRAM,
> > > > unless you have been explicitly told "I don't want this data any more" (or you have
> > > > some really good detection of "this working set won't fit in any caches").
> > > >
> > > > DRAM is just too far away, and caches are too effective - and you
> > > > very seldom know how much cache you have on a software level.
> > > >
> > > > Side note: that detection of "this working set won't fit in any caches" may well be
> > > > about the CPU knowing the size of a memory copy or memory clear operation ahead of time,
> > > > and taking those kinds of very explicit hints into account. Which is just another reason
> > > > you should have memory copy support in hardware, and not do it in software.
> > > >
> > > > Linus
> > >
> > > How confident are you about this for languages such as Java? Especially
> > > Java running in a server context with sophisticated multi-threaded GC?
> > >
> > > This seems like the sort of environment where zero-ing the data to be allocated later might be
> > > a win, especially if the data could be zero-d when nothing else needed the DRAM bandwidth.
> > >
> > >
> >
> > Why? Zeroing at the point of use means you can skip the DRAM step entirely.
> > You save one store to DRAM, and possibly even one load from DRAM.
>
> True. But zero-ing at the point of use is basically a security bet that "this time,
> trust us, there's absolutely no way anyone can break through our OS to construct a
> mechanism by which they can read pages on the free, but not yet-laundered, list".
That's security ad absurdum (which I admit is wildly popular among security "experts" these days). You can talk your way into returning to the stone age that way, but this time no cave paintings, much too risky!
I accept some amount of proactive "what if" security measures when the cost benefit is good. When the hypothetical vulnerability isn't even closed (or made some orders of magnitude harder to hit), i.e. reading pages on the allocated list, and the what-if-workaround comes with a significant cost, then cost/benefit analysis needs to be a lot tighter IMO.
> anon2 (anon.delete@this.anon.com) on October 3, 2021 7:58 pm wrote:
> > Mark Roulo (nothanks.delete@this.xxx.com) on October 3, 2021 12:41 pm wrote:
> > > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > > > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> > > > >
> > > > > Zeroing has room for optimization, both since you will often zero more than one page at a time and
> > > > > because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> > > > > of the cache.
> > > >
> > > > No you don't, actually.
> > > >
> > > > People have tried various pre-zeroing schemes over and over
> > > > and over again, and it's always been a loss in the end.
> > > >
> > > > Why? Caches work, and they grow over time. And basically every single time you zero
> > > > something, you are doing so because you're going to access much of the end result -
> > > > even if it's just to overwrite it with final data - in the not too distant future.
> > > >
> > > > Pre-zeroing and doing it at a DRAM or memory controller level is always going to be the wrong answer.
> > > > It's going to mean that when you access it, you're now going to take that very expensive cache miss.
> > > >
> > > > Yes, you can always find benchmarks where pre-zeroing is great, because you can
> > > > pick the benchmark where you have just the right working set size, and you can time
> > > > the memory operations to when they are most effective for that benchmark.
> > > >
> > > > And then on real loads it won't work at all. In fact, even on the benchmark it will be a loss on
> > > > other microarchitecures with bigger caches - so you're basically pessimising for the future.
> > > >
> > > > So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> > > > way the data will be close when it is accessed. Even if
> > > > it's accessed just for writing the actual new data on
> > > > top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> > > > will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
> > > >
> > > > So for big sparse arrays (or huge initial allocations), you may actually be much better
> > > > off allocating them with something like a "mmap()" interface for anonymous memory (pick
> > > > whatever non-unix equivalent), and just telling the system that you will need this much
> > > > memory, but then depend on demand-paging to zero the pages for you before use.
> > > >
> > > > Yes, you'll then take the page faults dynamically, but it might well end up
> > > > much better than pre-zeroing big buffers that you won't use for a while.
> > > >
> > > > As a rule of thumb, you never ever want to move memory accesses closer to DRAM,
> > > > unless you have been explicitly told "I don't want this data any more" (or you have
> > > > some really good detection of "this working set won't fit in any caches").
> > > >
> > > > DRAM is just too far away, and caches are too effective - and you
> > > > very seldom know how much cache you have on a software level.
> > > >
> > > > Side note: that detection of "this working set won't fit in any caches" may well be
> > > > about the CPU knowing the size of a memory copy or memory clear operation ahead of time,
> > > > and taking those kinds of very explicit hints into account. Which is just another reason
> > > > you should have memory copy support in hardware, and not do it in software.
> > > >
> > > > Linus
> > >
> > > How confident are you about this for languages such as Java? Especially
> > > Java running in a server context with sophisticated multi-threaded GC?
> > >
> > > This seems like the sort of environment where zero-ing the data to be allocated later might be
> > > a win, especially if the data could be zero-d when nothing else needed the DRAM bandwidth.
> > >
> > >
> >
> > Why? Zeroing at the point of use means you can skip the DRAM step entirely.
> > You save one store to DRAM, and possibly even one load from DRAM.
>
> True. But zero-ing at the point of use is basically a security bet that "this time,
> trust us, there's absolutely no way anyone can break through our OS to construct a
> mechanism by which they can read pages on the free, but not yet-laundered, list".
That's security ad absurdum (which I admit is wildly popular among security "experts" these days). You can talk your way into returning to the stone age that way, but this time no cave paintings, much too risky!
I accept some amount of proactive "what if" security measures when the cost benefit is good. When the hypothetical vulnerability isn't even closed (or made some orders of magnitude harder to hit), i.e. reading pages on the allocated list, and the what-if-workaround comes with a significant cost, then cost/benefit analysis needs to be a lot tighter IMO.