By: Brendan (btrotter.delete@this.gmail.com), May 22, 2022 8:41 pm
Room: Moderated Discussions
Hi,
Andrey (andrey.semashev.delete@this.gmail.com) on May 22, 2022 4:18 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on May 22, 2022 11:18 am wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 21, 2022 4:58 pm wrote:
> > > Brendan (btrotter.delete@this.gmail.com) on May 21, 2022 12:58 pm wrote:
> > > >
> > > > No. New software (designed to use the new CPUID leaves) would be aware of that problem
> > > > and would avoid it - e.g. maybe using something like "sched_setaffinity()" to lock the
> > > > thread to a specific CPU type before using CPUID (and maybe using "sched_setaffinity()"
> > > > again later to restore the original CPU affinity and allow migration again).
> > >
> > > That's the "it works in an embedded environment where you control everything" model.
> >
> > Erm, no?
> >
> > It's the run-time dispatch (e.g. choosing which version of functions to use based on CPUID results)
> > that some compilers (ICC) have been doing for ages; but slightly modified to work with dissimilar
> > cores by making it more fine grained (not once at program startup, but "anytime where needed")
> > and preventing scheduler from migrating to a different CPU type at the wrong time.
>
> That's not how libraries work.
Libraries? I was mostly talking about normal processes ("generic app"). For (shared) libraries you're already in a world of suckage because a compiler can't optimize anything between caller and callee (even with link-time optimization), so continuing to just plain suck (without optimizations for "P core" or "E core" and with only generic lack of optimization for "both P core and E core") is fine for most things.
Of course there would be a few "processing heavy" library functions where the benefits of better optimization (e.g. using AVX-512) justifies the cost of preventing migration to another type of CPU; so giving library developers the ability to choose which approach to take for their library isn't inherently worse.
> Your typical library will test CPU features once on load, initialization or the
> first call and save the pointer(s) to the selected implementation. After that the library can be called multiple
> times, from any threads running on any cores, and the library will use the saved pointers. So (a) even if you
> lock the affinity while running CPU detection, that doesn't help because the library will be used on any cores,
> and (b) locking the affinity permanently (not just for the duration of CPU detection) most of the time is not
> expected by the caller and is not an acceptable behavior of the library. Doing CPU detection on every use is
> also not acceptable because doing this is slow - especially, if adjusting thread affinity is involved.
>
> > It's the "developer controls nothing, generic app designed for any/all 80x86
> > CPUs adapts to whatever it happens to find itself running on" model (the opposite
> > of the "embedded environment where you control everything" model).
>
> Per the above, this approach cannot work in general libraries. It may work in an application that is tightly
> coupled with the libraries it uses, and is able to compartmentalize threads and libraries to specific cores.
> Most applications don't do that and cannot reasonably do that because performance implications of such work
> distribution are unclear and unpredictable unless you're the only application running on the system.
No. Processes that consume enough CPU time to matter can and do assume they're the only process running, and most of the time the assumption is correct because other processes are blocked/not running and/or "insignificant noise"; and when the assumption is incorrect (multiple "CPU heavy" processes at the same time) we just let the scheduler handle it.
The only real consequence of allowing "less bad" optimization for programs (not shared libraries) via. temporarily (e.g. maybe once every 1/60th of a second for a game) disabling migration to a different CPU type is that it limits the scheduler's ability to balance load between P cores and E cores; but when you're looking at chips with 8 P cores (or 8 E cores) there's plenty of scope to migrate threads to CPUs of the same type, so it's difficult to find a valid reason to care.
- Brendan
Andrey (andrey.semashev.delete@this.gmail.com) on May 22, 2022 4:18 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on May 22, 2022 11:18 am wrote:
> > Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 21, 2022 4:58 pm wrote:
> > > Brendan (btrotter.delete@this.gmail.com) on May 21, 2022 12:58 pm wrote:
> > > >
> > > > No. New software (designed to use the new CPUID leaves) would be aware of that problem
> > > > and would avoid it - e.g. maybe using something like "sched_setaffinity()" to lock the
> > > > thread to a specific CPU type before using CPUID (and maybe using "sched_setaffinity()"
> > > > again later to restore the original CPU affinity and allow migration again).
> > >
> > > That's the "it works in an embedded environment where you control everything" model.
> >
> > Erm, no?
> >
> > It's the run-time dispatch (e.g. choosing which version of functions to use based on CPUID results)
> > that some compilers (ICC) have been doing for ages; but slightly modified to work with dissimilar
> > cores by making it more fine grained (not once at program startup, but "anytime where needed")
> > and preventing scheduler from migrating to a different CPU type at the wrong time.
>
> That's not how libraries work.
Libraries? I was mostly talking about normal processes ("generic app"). For (shared) libraries you're already in a world of suckage because a compiler can't optimize anything between caller and callee (even with link-time optimization), so continuing to just plain suck (without optimizations for "P core" or "E core" and with only generic lack of optimization for "both P core and E core") is fine for most things.
Of course there would be a few "processing heavy" library functions where the benefits of better optimization (e.g. using AVX-512) justifies the cost of preventing migration to another type of CPU; so giving library developers the ability to choose which approach to take for their library isn't inherently worse.
> Your typical library will test CPU features once on load, initialization or the
> first call and save the pointer(s) to the selected implementation. After that the library can be called multiple
> times, from any threads running on any cores, and the library will use the saved pointers. So (a) even if you
> lock the affinity while running CPU detection, that doesn't help because the library will be used on any cores,
> and (b) locking the affinity permanently (not just for the duration of CPU detection) most of the time is not
> expected by the caller and is not an acceptable behavior of the library. Doing CPU detection on every use is
> also not acceptable because doing this is slow - especially, if adjusting thread affinity is involved.
>
> > It's the "developer controls nothing, generic app designed for any/all 80x86
> > CPUs adapts to whatever it happens to find itself running on" model (the opposite
> > of the "embedded environment where you control everything" model).
>
> Per the above, this approach cannot work in general libraries. It may work in an application that is tightly
> coupled with the libraries it uses, and is able to compartmentalize threads and libraries to specific cores.
> Most applications don't do that and cannot reasonably do that because performance implications of such work
> distribution are unclear and unpredictable unless you're the only application running on the system.
No. Processes that consume enough CPU time to matter can and do assume they're the only process running, and most of the time the assumption is correct because other processes are blocked/not running and/or "insignificant noise"; and when the assumption is incorrect (multiple "CPU heavy" processes at the same time) we just let the scheduler handle it.
The only real consequence of allowing "less bad" optimization for programs (not shared libraries) via. temporarily (e.g. maybe once every 1/60th of a second for a game) disabling migration to a different CPU type is that it limits the scheduler's ability to balance load between P cores and E cores; but when you're looking at chips with 8 P cores (or 8 E cores) there's plenty of scope to migrate threads to CPUs of the same type, so it's difficult to find a valid reason to care.
- Brendan