By: rwessel (rwessel.delete@this.yahoo.com), October 4, 2021 4:54 pm
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on October 4, 2021 9:41 am wrote:
> anon2 (anon.delete@this.anon.com) on October 3, 2021 7:58 pm wrote:
> > Mark Roulo (nothanks.delete@this.xxx.com) on October 3, 2021 12:41 pm wrote:
> > > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > > > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> > > > >
> > > > > Zeroing has room for optimization, both since you will often zero more than one page at a time and
> > > > > because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> > > > > of the cache.
> > > >
> > > > No you don't, actually.
> > > >
> > > > People have tried various pre-zeroing schemes over and over
> > > > and over again, and it's always been a loss in the end.
> > > >
> > > > Why? Caches work, and they grow over time. And basically every single time you zero
> > > > something, you are doing so because you're going to access much of the end result -
> > > > even if it's just to overwrite it with final data - in the not too distant future.
> > > >
> > > > Pre-zeroing and doing it at a DRAM or memory controller level is always going to be the wrong answer.
> > > > It's going to mean that when you access it, you're now going to take that very expensive cache miss.
> > > >
> > > > Yes, you can always find benchmarks where pre-zeroing is great, because you can
> > > > pick the benchmark where you have just the right working set size, and you can time
> > > > the memory operations to when they are most effective for that benchmark.
> > > >
> > > > And then on real loads it won't work at all. In fact, even on the benchmark it will be a loss on
> > > > other microarchitecures with bigger caches - so you're basically pessimising for the future.
> > > >
> > > > So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> > > > way the data will be close when it is accessed. Even if
> > > > it's accessed just for writing the actual new data on
> > > > top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> > > > will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
> > > >
> > > > So for big sparse arrays (or huge initial allocations), you may actually be much better
> > > > off allocating them with something like a "mmap()" interface for anonymous memory (pick
> > > > whatever non-unix equivalent), and just telling the system that you will need this much
> > > > memory, but then depend on demand-paging to zero the pages for you before use.
> > > >
> > > > Yes, you'll then take the page faults dynamically, but it might well end up
> > > > much better than pre-zeroing big buffers that you won't use for a while.
> > > >
> > > > As a rule of thumb, you never ever want to move memory accesses closer to DRAM,
> > > > unless you have been explicitly told "I don't want this data any more" (or you have
> > > > some really good detection of "this working set won't fit in any caches").
> > > >
> > > > DRAM is just too far away, and caches are too effective - and you
> > > > very seldom know how much cache you have on a software level.
> > > >
> > > > Side note: that detection of "this working set won't fit in any caches" may well be
> > > > about the CPU knowing the size of a memory copy or memory clear operation ahead of time,
> > > > and taking those kinds of very explicit hints into account. Which is just another reason
> > > > you should have memory copy support in hardware, and not do it in software.
> > > >
> > > > Linus
> > >
> > > How confident are you about this for languages such as Java? Especially
> > > Java running in a server context with sophisticated multi-threaded GC?
> > >
> > > This seems like the sort of environment where zero-ing the data to be allocated later might be
> > > a win, especially if the data could be zero-d when nothing else needed the DRAM bandwidth.
> > >
> > >
> >
> > Why? Zeroing at the point of use means you can skip the DRAM step entirely.
> > You save one store to DRAM, and possibly even one load from DRAM.
>
> True. But zero-ing at the point of use is basically a security bet that "this time,
> trust us, there's absolutely no way anyone can break through our OS to construct a
> mechanism by which they can read pages on the free, but not yet-laundered, list".
> Is that a good bet?
>
> In principle (sure, in principle) it's no different from saying "trust us, there's absolutely no
> way anyone can read pages in a different process", and if that fails, well, it's game over.
>
> But in practice, it seems to me, one of these paths may be substantially harder (by dint of
> long experience; and on the theoretical grounds that the page settings are now some sort of
> "owned by kernel -- and I the hacker am the kernel -- rather than owned by process").
>
> Is this worth worrying about? I'm not a hacker, and I know and care little about security. But my limited understanding
> of, eg, Apple's special page protection schemes that go above and beyond traditional OS and traditional HW, would
> not protect such pages once they transitioned from owned by a process to owned by the kernel.
I think you could at least argue that the zeroing during/after GC is one thing, and for zeroing pages before re-use in another process is sufficiently different that they don't necessarily call for the same mechanism. For the later, a zero-page instruction might make some sense (zArch Perform Frame Management Function, for example).
> anon2 (anon.delete@this.anon.com) on October 3, 2021 7:58 pm wrote:
> > Mark Roulo (nothanks.delete@this.xxx.com) on October 3, 2021 12:41 pm wrote:
> > > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on October 3, 2021 11:09 am wrote:
> > > > Doug S (foo.delete@this.bar.bar) on October 3, 2021 10:09 am wrote:
> > > > >
> > > > > Zeroing has room for optimization, both since you will often zero more than one page at a time and
> > > > > because zeroes are rarely read before they are overwritten - so you want that activity to occur outside
> > > > > of the cache.
> > > >
> > > > No you don't, actually.
> > > >
> > > > People have tried various pre-zeroing schemes over and over
> > > > and over again, and it's always been a loss in the end.
> > > >
> > > > Why? Caches work, and they grow over time. And basically every single time you zero
> > > > something, you are doing so because you're going to access much of the end result -
> > > > even if it's just to overwrite it with final data - in the not too distant future.
> > > >
> > > > Pre-zeroing and doing it at a DRAM or memory controller level is always going to be the wrong answer.
> > > > It's going to mean that when you access it, you're now going to take that very expensive cache miss.
> > > >
> > > > Yes, you can always find benchmarks where pre-zeroing is great, because you can
> > > > pick the benchmark where you have just the right working set size, and you can time
> > > > the memory operations to when they are most effective for that benchmark.
> > > >
> > > > And then on real loads it won't work at all. In fact, even on the benchmark it will be a loss on
> > > > other microarchitecures with bigger caches - so you're basically pessimising for the future.
> > > >
> > > > So what you want to do is to zero your memory basically as late as possible, just before it gets used. That
> > > > way the data will be close when it is accessed. Even if
> > > > it's accessed just for writing the actual new data on
> > > > top - a lot of zeroing is for initialization and security reasons, and to make for consistent behavior - it
> > > > will be at least already dirty and exclusive in your caches, which is exactly what you want for a write.
> > > >
> > > > So for big sparse arrays (or huge initial allocations), you may actually be much better
> > > > off allocating them with something like a "mmap()" interface for anonymous memory (pick
> > > > whatever non-unix equivalent), and just telling the system that you will need this much
> > > > memory, but then depend on demand-paging to zero the pages for you before use.
> > > >
> > > > Yes, you'll then take the page faults dynamically, but it might well end up
> > > > much better than pre-zeroing big buffers that you won't use for a while.
> > > >
> > > > As a rule of thumb, you never ever want to move memory accesses closer to DRAM,
> > > > unless you have been explicitly told "I don't want this data any more" (or you have
> > > > some really good detection of "this working set won't fit in any caches").
> > > >
> > > > DRAM is just too far away, and caches are too effective - and you
> > > > very seldom know how much cache you have on a software level.
> > > >
> > > > Side note: that detection of "this working set won't fit in any caches" may well be
> > > > about the CPU knowing the size of a memory copy or memory clear operation ahead of time,
> > > > and taking those kinds of very explicit hints into account. Which is just another reason
> > > > you should have memory copy support in hardware, and not do it in software.
> > > >
> > > > Linus
> > >
> > > How confident are you about this for languages such as Java? Especially
> > > Java running in a server context with sophisticated multi-threaded GC?
> > >
> > > This seems like the sort of environment where zero-ing the data to be allocated later might be
> > > a win, especially if the data could be zero-d when nothing else needed the DRAM bandwidth.
> > >
> > >
> >
> > Why? Zeroing at the point of use means you can skip the DRAM step entirely.
> > You save one store to DRAM, and possibly even one load from DRAM.
>
> True. But zero-ing at the point of use is basically a security bet that "this time,
> trust us, there's absolutely no way anyone can break through our OS to construct a
> mechanism by which they can read pages on the free, but not yet-laundered, list".
> Is that a good bet?
>
> In principle (sure, in principle) it's no different from saying "trust us, there's absolutely no
> way anyone can read pages in a different process", and if that fails, well, it's game over.
>
> But in practice, it seems to me, one of these paths may be substantially harder (by dint of
> long experience; and on the theoretical grounds that the page settings are now some sort of
> "owned by kernel -- and I the hacker am the kernel -- rather than owned by process").
>
> Is this worth worrying about? I'm not a hacker, and I know and care little about security. But my limited understanding
> of, eg, Apple's special page protection schemes that go above and beyond traditional OS and traditional HW, would
> not protect such pages once they transitioned from owned by a process to owned by the kernel.
I think you could at least argue that the zeroing during/after GC is one thing, and for zeroing pages before re-use in another process is sufficiently different that they don't necessarily call for the same mechanism. For the later, a zero-page instruction might make some sense (zArch Perform Frame Management Function, for example).