By: --- (---.delete@this.redheron.com), October 4, 2021 9:41 am
Room: Moderated Discussions
anon2 (anon.delete@this.anon.com) on October 3, 2021 7:58 pm wrote:
> Mark Roulo (nothanks.delete@this.xxx.com) on October 3, 2021 12:41 pm wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> > > >
> > > > Zeroing has room for optimization, both since you will often zero more than one page at a time and
> > > > because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> > > > of the cache.
> > >
> > > No you don't, actually.
> > >
> > > People have tried various pre-zeroing schemes over and over
> > > and over again, and it's always been a loss in the end.
> > >
> > > Why? Caches work, and they grow over time. And basically every single time you zero
> > > something, you are doing so because you're going to access much of the end result -
> > > even if it's just to overwrite it with final data - in the not too distant future.
> > >
> > > Pre-zeroing and doing it at a DRAM or memory controller level is always going to be the wrong answer.
> > > It's going to mean that when you access it, you're now going to take that very expensive cache miss.
> > >
> > > Yes, you can always find benchmarks where pre-zeroing is great, because you can
> > > pick the benchmark where you have just the right working set size, and you can time
> > > the memory operations to when they are most effective for that benchmark.
> > >
> > > And then on real loads it won't work at all. In fact, even on the benchmark it will be a loss on
> > > other microarchitecures with bigger caches - so you're basically pessimising for the future.
> > >
> > > So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> > > way the data will be close when it is accessed. Even if
> > > it's accessed just for writing the actual new data on
> > > top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> > > will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
> > >
> > > So for big sparse arrays (or huge initial allocations), you may actually be much better
> > > off allocating them with something like a "mmap()" interface for anonymous memory (pick
> > > whatever non-unix equivalent), and just telling the system that you will need this much
> > > memory, but then depend on demand-paging to zero the pages for you before use.
> > >
> > > Yes, you'll then take the page faults dynamically, but it might well end up
> > > much better than pre-zeroing big buffers that you won't use for a while.
> > >
> > > As a rule of thumb, you never ever want to move memory accesses closer to DRAM,
> > > unless you have been explicitly told "I don't want this data any more" (or you have
> > > some really good detection of "this working set won't fit in any caches").
> > >
> > > DRAM is just too far away, and caches are too effective - and you
> > > very seldom know how much cache you have on a software level.
> > >
> > > Side note: that detection of "this working set won't fit in any caches" may well be
> > > about the CPU knowing the size of a memory copy or memory clear operation ahead of time,
> > > and taking those kinds of very explicit hints into account. Which is just another reason
> > > you should have memory copy support in hardware, and not do it in software.
> > >
> > > Linus
> >
> > How confident are you about this for languages such as Java? Especially
> > Java running in a server context with sophisticated multi-threaded GC?
> >
> > This seems like the sort of environment where zero-ing the data to be allocated later might be
> > a win, especially if the data could be zero-d when nothing else needed the DRAM bandwidth.
> >
> >
>
> Why? Zeroing at the point of use means you can skip the DRAM step entirely.
> You save one store to DRAM, and possibly even one load from DRAM.
True. But zero-ing at the point of use is basically a security bet that "this time, trust us, there's absolutely no way anyone can break through our OS to construct a mechanism by which they can read pages on the free, but not yet-laundered, list".
Is that a good bet?
In principle (sure, in principle) it's no different from saying "trust us, there's absolutely no way anyone can read pages in a different process", and if that fails, well, it's game over.
But in practice, it seems to me, one of these paths may be substantially harder (by dint of long experience; and on the theoretical grounds that the page settings are now some sort of "owned by kernel -- and I the hacker am the kernel -- rather than owned by process").
Is this worth worrying about? I'm not a hacker, and I know and care little about security. But my limited understanding of, eg, Apple's special page protection schemes that go above and beyond traditional OS and traditional HW, would not protect such pages once they transitioned from owned by a process to owned by the kernel.
> Mark Roulo (nothanks.delete@this.xxx.com) on October 3, 2021 12:41 pm wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> > > >
> > > > Zeroing has room for optimization, both since you will often zero more than one page at a time and
> > > > because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> > > > of the cache.
> > >
> > > No you don't, actually.
> > >
> > > People have tried various pre-zeroing schemes over and over
> > > and over again, and it's always been a loss in the end.
> > >
> > > Why? Caches work, and they grow over time. And basically every single time you zero
> > > something, you are doing so because you're going to access much of the end result -
> > > even if it's just to overwrite it with final data - in the not too distant future.
> > >
> > > Pre-zeroing and doing it at a DRAM or memory controller level is always going to be the wrong answer.
> > > It's going to mean that when you access it, you're now going to take that very expensive cache miss.
> > >
> > > Yes, you can always find benchmarks where pre-zeroing is great, because you can
> > > pick the benchmark where you have just the right working set size, and you can time
> > > the memory operations to when they are most effective for that benchmark.
> > >
> > > And then on real loads it won't work at all. In fact, even on the benchmark it will be a loss on
> > > other microarchitecures with bigger caches - so you're basically pessimising for the future.
> > >
> > > So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> > > way the data will be close when it is accessed. Even if
> > > it's accessed just for writing the actual new data on
> > > top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> > > will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
> > >
> > > So for big sparse arrays (or huge initial allocations), you may actually be much better
> > > off allocating them with something like a "mmap()" interface for anonymous memory (pick
> > > whatever non-unix equivalent), and just telling the system that you will need this much
> > > memory, but then depend on demand-paging to zero the pages for you before use.
> > >
> > > Yes, you'll then take the page faults dynamically, but it might well end up
> > > much better than pre-zeroing big buffers that you won't use for a while.
> > >
> > > As a rule of thumb, you never ever want to move memory accesses closer to DRAM,
> > > unless you have been explicitly told "I don't want this data any more" (or you have
> > > some really good detection of "this working set won't fit in any caches").
> > >
> > > DRAM is just too far away, and caches are too effective - and you
> > > very seldom know how much cache you have on a software level.
> > >
> > > Side note: that detection of "this working set won't fit in any caches" may well be
> > > about the CPU knowing the size of a memory copy or memory clear operation ahead of time,
> > > and taking those kinds of very explicit hints into account. Which is just another reason
> > > you should have memory copy support in hardware, and not do it in software.
> > >
> > > Linus
> >
> > How confident are you about this for languages such as Java? Especially
> > Java running in a server context with sophisticated multi-threaded GC?
> >
> > This seems like the sort of environment where zero-ing the data to be allocated later might be
> > a win, especially if the data could be zero-d when nothing else needed the DRAM bandwidth.
> >
> >
>
> Why? Zeroing at the point of use means you can skip the DRAM step entirely.
> You save one store to DRAM, and possibly even one load from DRAM.
True. But zero-ing at the point of use is basically a security bet that "this time, trust us, there's absolutely no way anyone can break through our OS to construct a mechanism by which they can read pages on the free, but not yet-laundered, list".
Is that a good bet?
In principle (sure, in principle) it's no different from saying "trust us, there's absolutely no way anyone can read pages in a different process", and if that fails, well, it's game over.
But in practice, it seems to me, one of these paths may be substantially harder (by dint of long experience; and on the theoretical grounds that the page settings are now some sort of "owned by kernel -- and I the hacker am the kernel -- rather than owned by process").
Is this worth worrying about? I'm not a hacker, and I know and care little about security. But my limited understanding of, eg, Apple's special page protection schemes that go above and beyond traditional OS and traditional HW, would not protect such pages once they transitioned from owned by a process to owned by the kernel.