By: Andrey (andrey.semashev.delete@this.gmail.com), October 6, 2021 2:59 pm
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on October 6, 2021 9:07 am wrote:
> Adrian (a.delete@this.acm.org) on October 5, 2021 11:28 pm wrote:
> > David Hess (davidwhess.delete@this.gmail.com) on October 5, 2021 5:19 pm wrote:
> > > Yuhong Bao (yuhongbao_386.delete@this.hotmail.com) on October 3, 2021 1:30 am wrote:
> > > >
> > > > Even on the IBM PC the 8237 memory-to-memory copy mode never were used.
> > >
> > > Embedded systems running on bare metal might have used it, and I think I remember seeing examples
> > > in application notes, but didn't Dr. Dobb's Journal include an example on the PC in one issue?
> > >
> >
> >
> > The 8237 DMA was very seldom used for anything in IBM PC's (except in its legacy application in the floppy
> > disk driver) not because it was a DMA controller, but because it was a very bad DMA controller.
> >
> >
> > 8237 was not really designed as a companion of CPU's with large address spaces. It had just
> > minimal improvements over the 8257, which was designed as a companion for Intel 8080.
> >
> >
> > Using 8237 for large or unaligned transfers was tedious and it was very slow.
> >
> > Starting with the IBM PC/AT, i.e. with 80286 or better CPUs, the CPU had the instructions
> > INS & OUTS, which could transfer data much faster and much simpler than 8237.
> >
> >
> > So the fact that one of the worst DMA controllers known in history, 8237, was not used in IBM PCs, does
> > not prove anything about whether it is better or worse to use DMA or the CPU for memcpy/memset.
> >
> > With many embedded CPUs, using a DMA controller for the larger transfers is certainly better.
> >
> > Whether this might also be worthwhile on a high-performance
> > CPU with multi-level caches, it is hard to estimate
> > without some realistic simulations, as it depends on how frequent the larger transfers are and on which
> > effects would dominate, those that would increase or those that would decrease the performance.
> >
>
> I'd add to this the reminder that the CPU is no longer the only kid on the SoC.
> It's fine to talk about "zero in the cache and now the data is right there where you want it"
> except that maybe this page you just zero'd is actually being allocated on behalf of the GPU,
> or NPU, or ISP, or media decoder, or any of the dozens of helper cores on the SoC...
As I understand, that's what different memory caching modes are for. For WB memory, your non-CPU agents (or, similarly, other CPUs on a different socket) will have to go through the cache hierarchy anyway.
> Adrian (a.delete@this.acm.org) on October 5, 2021 11:28 pm wrote:
> > David Hess (davidwhess.delete@this.gmail.com) on October 5, 2021 5:19 pm wrote:
> > > Yuhong Bao (yuhongbao_386.delete@this.hotmail.com) on October 3, 2021 1:30 am wrote:
> > > >
> > > > Even on the IBM PC the 8237 memory-to-memory copy mode never were used.
> > >
> > > Embedded systems running on bare metal might have used it, and I think I remember seeing examples
> > > in application notes, but didn't Dr. Dobb's Journal include an example on the PC in one issue?
> > >
> >
> >
> > The 8237 DMA was very seldom used for anything in IBM PC's (except in its legacy application in the floppy
> > disk driver) not because it was a DMA controller, but because it was a very bad DMA controller.
> >
> >
> > 8237 was not really designed as a companion of CPU's with large address spaces. It had just
> > minimal improvements over the 8257, which was designed as a companion for Intel 8080.
> >
> >
> > Using 8237 for large or unaligned transfers was tedious and it was very slow.
> >
> > Starting with the IBM PC/AT, i.e. with 80286 or better CPUs, the CPU had the instructions
> > INS & OUTS, which could transfer data much faster and much simpler than 8237.
> >
> >
> > So the fact that one of the worst DMA controllers known in history, 8237, was not used in IBM PCs, does
> > not prove anything about whether it is better or worse to use DMA or the CPU for memcpy/memset.
> >
> > With many embedded CPUs, using a DMA controller for the larger transfers is certainly better.
> >
> > Whether this might also be worthwhile on a high-performance
> > CPU with multi-level caches, it is hard to estimate
> > without some realistic simulations, as it depends on how frequent the larger transfers are and on which
> > effects would dominate, those that would increase or those that would decrease the performance.
> >
>
> I'd add to this the reminder that the CPU is no longer the only kid on the SoC.
> It's fine to talk about "zero in the cache and now the data is right there where you want it"
> except that maybe this page you just zero'd is actually being allocated on behalf of the GPU,
> or NPU, or ISP, or media decoder, or any of the dozens of helper cores on the SoC...
As I understand, that's what different memory caching modes are for. For WB memory, your non-CPU agents (or, similarly, other CPUs on a different socket) will have to go through the cache hierarchy anyway.