By: --- (---.delete@this.redheron.com), October 2, 2021 7:32 pm
Room: Moderated Discussions
Mark Roulo (nothanks.delete@this.xxx.com) on October 2, 2021 3:59 pm wrote:
> --- (---.delete@this.redheron.com) on October 2, 2021 3:12 pm wrote:
> ...snip ...
>
> > Oh, one more thing. Yet another way to solve the problem is not by DMA but by moving the task
> > (copy or flood-fill) up to the memory controller. Again much more feasible to the extent that
> > the memory controller has access to coherency tags. This allows you to do the job by routing
> > from DRAM to controller to DRAM, bypassing NoC and everything else, so lower power.
> >
> > The ultimate, of course is to do the job purely within the
> > DRAM. Onur Mutlu has published details of how this
> > could realistically be added to existing DRAM, but as far
> > as I know no-one has yet done so. Every year we have
> > some excitement about PIM around Hot Chips, then it all goes away and another year passes with no actually
> > purchasable PIM hardware. Even Apple, as far as we all
> > know, uses vanilla DRAM, and their temporary stake in
> > Toshiba Memory was apparently just a bit of financial engineering, not a prelude to bespoke DRAM :-(
>
> You are solving the problem of BULK memcpy.
>
(a)
Yes indeed. Because that was the subject Linus considered. As in, from my text that you snipped,
>>> I agree with you primary point, that obsessing over super-bulk transfers is not the first thing to worry
>>> about.
> If code is copying (or zero-ing) 5 - 100 bytes there is a very good chance that the copied or
> zero-d memory is going to be used immediately (for some values of immediately). Pushing the copy
> or zero to the DRAM is pretty much the wrong thing to do for either performance or power.
>
> NOTE: Bulk memory zero-ing (as might be useful for managed languages such as Java)
> might make a lot of sense. You could set up for 'free' memory ahead of time.
>
> Or maybe not. In theory the data could be zero-d as it was read
> into the caches so the bulk zero-ing in DRAM might be pointless.
(b)
Most of this discussion seems to think that (on aesthetic reasons, nothing else) a single solution only should exist; and that that solution should be considered only in light of use by a programming language.
But there are multiple use cases, many of which are outside of the domain of a programming language. These include eg
- wiping a page by the OS (eg for security reasons) AND
- the copy part of copy-on write of a page
both of which one may lend themselves to unorthodox mechanisms.
OF COURSE most copies (within a language) are small, most such copies probably want the copied data present in cache, and such copies are optimally handled either by existing instructions (with nice alignment and known sizes) or by *very simple* augmenting instructions.
But if your mind is wandering into the space of "let's do it via DMA", the issue is not that that's an empty space, it's that that's a lot less likely a job you want to be generated automatically by the compiler using weird instructions;
rather that's a task that you will call by API.
> --- (---.delete@this.redheron.com) on October 2, 2021 3:12 pm wrote:
> ...snip ...
>
> > Oh, one more thing. Yet another way to solve the problem is not by DMA but by moving the task
> > (copy or flood-fill) up to the memory controller. Again much more feasible to the extent that
> > the memory controller has access to coherency tags. This allows you to do the job by routing
> > from DRAM to controller to DRAM, bypassing NoC and everything else, so lower power.
> >
> > The ultimate, of course is to do the job purely within the
> > DRAM. Onur Mutlu has published details of how this
> > could realistically be added to existing DRAM, but as far
> > as I know no-one has yet done so. Every year we have
> > some excitement about PIM around Hot Chips, then it all goes away and another year passes with no actually
> > purchasable PIM hardware. Even Apple, as far as we all
> > know, uses vanilla DRAM, and their temporary stake in
> > Toshiba Memory was apparently just a bit of financial engineering, not a prelude to bespoke DRAM :-(
>
> You are solving the problem of BULK memcpy.
>
(a)
Yes indeed. Because that was the subject Linus considered. As in, from my text that you snipped,
>>> I agree with you primary point, that obsessing over super-bulk transfers is not the first thing to worry
>>> about.
> If code is copying (or zero-ing) 5 - 100 bytes there is a very good chance that the copied or
> zero-d memory is going to be used immediately (for some values of immediately). Pushing the copy
> or zero to the DRAM is pretty much the wrong thing to do for either performance or power.
>
> NOTE: Bulk memory zero-ing (as might be useful for managed languages such as Java)
> might make a lot of sense. You could set up for 'free' memory ahead of time.
>
> Or maybe not. In theory the data could be zero-d as it was read
> into the caches so the bulk zero-ing in DRAM might be pointless.
(b)
Most of this discussion seems to think that (on aesthetic reasons, nothing else) a single solution only should exist; and that that solution should be considered only in light of use by a programming language.
But there are multiple use cases, many of which are outside of the domain of a programming language. These include eg
- wiping a page by the OS (eg for security reasons) AND
- the copy part of copy-on write of a page
both of which one may lend themselves to unorthodox mechanisms.
OF COURSE most copies (within a language) are small, most such copies probably want the copied data present in cache, and such copies are optimally handled either by existing instructions (with nice alignment and known sizes) or by *very simple* augmenting instructions.
But if your mind is wandering into the space of "let's do it via DMA", the issue is not that that's an empty space, it's that that's a lot less likely a job you want to be generated automatically by the compiler using weird instructions;
rather that's a task that you will call by API.