By: Doug S (foo.delete@this.bar.bar), October 3, 2021 10:09 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on October 3, 2021 1:40 am wrote:
> 1-page copy or 1-page zeroing is also small.
> > Nearly all disadvantages of out-of-core copy/set of 1-500 byte apply to 4K. And to 8K if your L1D is 32KB.
How often do you need only one page zeroed? With COW yeah unfortunately that's going to be a one at a time thing because of the way it works. Having the result of a COW in cache is likely helpful because more writes to that page will follow more often than not. You shouldn't assume working on single page implies 4K though. There is one ARM implementation with 1.5 billion devices in the wild using 16K pages, and even hardware still on 4K pages supports larger page sizes.
Zeroing has room for optimization, both since you will often zero more than one page at a time and because zeroes are rarely read before they are overwritten - so you want that activity to occur outside of the cache. Zeroing is a perfect candidate to take place without the involvement of the CPU.
Systems don't spend enough time zeroing pages to be worth something crazy like building the capability for that into DRAM, but I could see building it into the memory controller now that they are closely coupled with the CPU in all modern designs. When a page becomes free the OS could add it to a list of "ready to be zeroed" memory ranges kept in the controller to be handled when the controller is idle and letting the OS know as pages are cleared.
If there was an intersection between page size/range of an OS and row/column model of DRAM addressing, which obviously there is not, the dream solution would be to stop refreshing those pages until they get used as you'd get zeroing "for free" along with a bit of power savings!
> 1-page copy or 1-page zeroing is also small.
> > Nearly all disadvantages of out-of-core copy/set of 1-500 byte apply to 4K. And to 8K if your L1D is 32KB.
How often do you need only one page zeroed? With COW yeah unfortunately that's going to be a one at a time thing because of the way it works. Having the result of a COW in cache is likely helpful because more writes to that page will follow more often than not. You shouldn't assume working on single page implies 4K though. There is one ARM implementation with 1.5 billion devices in the wild using 16K pages, and even hardware still on 4K pages supports larger page sizes.
Zeroing has room for optimization, both since you will often zero more than one page at a time and because zeroes are rarely read before they are overwritten - so you want that activity to occur outside of the cache. Zeroing is a perfect candidate to take place without the involvement of the CPU.
Systems don't spend enough time zeroing pages to be worth something crazy like building the capability for that into DRAM, but I could see building it into the memory controller now that they are closely coupled with the CPU in all modern designs. When a page becomes free the OS could add it to a list of "ready to be zeroed" memory ranges kept in the controller to be handled when the controller is idle and letting the OS know as pages are cleared.
If there was an intersection between page size/range of an OS and row/column model of DRAM addressing, which obviously there is not, the dream solution would be to stop refreshing those pages until they get used as you'd get zeroing "for free" along with a bit of power savings!