By: --- (---.delete@this.redheron.com), October 6, 2021 9:07 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on October 5, 2021 11:28 pm wrote:
> David Hess (davidwhess.delete@this.gmail.com) on October 5, 2021 5:19 pm wrote:
> > Yuhong Bao (yuhongbao_386.delete@this.hotmail.com) on October 3, 2021 1:30 am wrote:
> > >
> > > Even on the IBM PC the 8237 memory-to-memory copy mode never were used.
> >
> > Embedded systems running on bare metal might have used it, and I think I remember seeing examples
> > in application notes, but didn't Dr. Dobb's Journal include an example on the PC in one issue?
> >
>
>
> The 8237 DMA was very seldom used for anything in IBM PC's (except in its legacy application in the floppy
> disk driver) not because it was a DMA controller, but because it was a very bad DMA controller.
>
>
> 8237 was not really designed as a companion of CPU's with large address spaces. It had just
> minimal improvements over the 8257, which was designed as a companion for Intel 8080.
>
>
> Using 8237 for large or unaligned transfers was tedious and it was very slow.
>
> Starting with the IBM PC/AT, i.e. with 80286 or better CPUs, the CPU had the instructions
> INS & OUTS, which could transfer data much faster and much simpler than 8237.
>
>
> So the fact that one of the worst DMA controllers known in history, 8237, was not used in IBM PCs, does
> not prove anything about whether it is better or worse to use DMA or the CPU for memcpy/memset.
>
> With many embedded CPUs, using a DMA controller for the larger transfers is certainly better.
>
> Whether this might also be worthwhile on a high-performance CPU with multi-level caches, it is hard to estimate
> without some realistic simulations, as it depends on how frequent the larger transfers are and on which
> effects would dominate, those that would increase or those that would decrease the performance.
>
I'd add to this the reminder that the CPU is no longer the only kid on the SoC.
It's fine to talk about "zero in the cache and now the data is right there where you want it" except that maybe this page you just zero'd is actually being allocated on behalf of the GPU, or NPU, or ISP, or media decoder, or any of the dozens of helper cores on the SoC...
Like I said -- the time of one-size-fits-all solutions for the floodfill/data movement task is over in mobile, and heading that way in one part of the desktop. Will the rest of desktop follow?
> David Hess (davidwhess.delete@this.gmail.com) on October 5, 2021 5:19 pm wrote:
> > Yuhong Bao (yuhongbao_386.delete@this.hotmail.com) on October 3, 2021 1:30 am wrote:
> > >
> > > Even on the IBM PC the 8237 memory-to-memory copy mode never were used.
> >
> > Embedded systems running on bare metal might have used it, and I think I remember seeing examples
> > in application notes, but didn't Dr. Dobb's Journal include an example on the PC in one issue?
> >
>
>
> The 8237 DMA was very seldom used for anything in IBM PC's (except in its legacy application in the floppy
> disk driver) not because it was a DMA controller, but because it was a very bad DMA controller.
>
>
> 8237 was not really designed as a companion of CPU's with large address spaces. It had just
> minimal improvements over the 8257, which was designed as a companion for Intel 8080.
>
>
> Using 8237 for large or unaligned transfers was tedious and it was very slow.
>
> Starting with the IBM PC/AT, i.e. with 80286 or better CPUs, the CPU had the instructions
> INS & OUTS, which could transfer data much faster and much simpler than 8237.
>
>
> So the fact that one of the worst DMA controllers known in history, 8237, was not used in IBM PCs, does
> not prove anything about whether it is better or worse to use DMA or the CPU for memcpy/memset.
>
> With many embedded CPUs, using a DMA controller for the larger transfers is certainly better.
>
> Whether this might also be worthwhile on a high-performance CPU with multi-level caches, it is hard to estimate
> without some realistic simulations, as it depends on how frequent the larger transfers are and on which
> effects would dominate, those that would increase or those that would decrease the performance.
>
I'd add to this the reminder that the CPU is no longer the only kid on the SoC.
It's fine to talk about "zero in the cache and now the data is right there where you want it" except that maybe this page you just zero'd is actually being allocated on behalf of the GPU, or NPU, or ISP, or media decoder, or any of the dozens of helper cores on the SoC...
Like I said -- the time of one-size-fits-all solutions for the floodfill/data movement task is over in mobile, and heading that way in one part of the desktop. Will the rest of desktop follow?