By: Brendan (btrotter.delete@this.gmail.com), May 23, 2022 11:44 am
Room: Moderated Discussions
Hi,
Doug S (foo.delete@this.bar.bar) on May 23, 2022 8:35 am wrote:
> ⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on May 23, 2022 6:03 am wrote:
> > Doug S (foo.delete@this.bar.bar) on May 22, 2022 8:52 pm wrote:
> > > ⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on May 22, 2022 11:51 am wrote:
> > > > You are overly protective of what the Linux kernel currently is. There is no vision of a future
> > > > of heterogeneous CPUs in your posts .... if heterogeneous desktop/notebook CPUs are inevitable
> > > > then you should have a plan for it or make a plan for it. (An example reason why heterogeneous
> > > > CPUs are inevitable in those markets is that endowing _all_ cores in a future desktop machine
> > > > with the ability to predict 4 branches per cycle would be problematic.)
> > >
> > >
> > > That already exists, Linux handles stuff like "big cores" that are wider and "small cores" that are
> > > narrower just fine, so long as they both execute the same ISA. What Linus doesn't want to see are
> > > cores that don't support the same ISA, with certain instructions only present on some cores.
> > >
> > > If the hardware people give us that Linux will support it, but that doesn't mean he shouldn't
> > > go around telling people why he thinks that would be a bad idea. You can't really "plan"
> > > for something like this until it appears and you know what you're working with.
> >
> > I strongly disagree that it is impossible to plan for hetero-ISA multicore CPUs. The kernel is
> > [mostly] in control of the CPU, not the other way round (not CPU in control of the kernel). Thus,
> > approximately 90% of the complexity related to enabling efficient use of such CPUs is a software
> > design problem, not a hardware design problem. In other words, it is about software design (aka:
> > plan), and hardware is "just" providing execution resources for use by the kernel&apps.
> >
> > If the Linux kernel was designed/prepared to take hetero-ISA CPUs seriously, then it would already
> > be providing a clean and efficient interface between the kernel-space and the applications running
> > in user-space related to communicating/negotiating information necessary to run applications
> > on hetero-ISA CPUs efficiently. ---- By "communicating/negotiating information" I mean both directions
> > (not just one direction): kernel -> userspace and userspace -> kernel.
> >
> > In my opinion, it is a mistake to believe that hetero-ISA CPUs can be utilized efficiently without
> > a bidirectional communication channel between the OS and the applications. But unfortunately, this
> > is exactly the belief/viewpoint that Linus is advocating in relation to the Linux kernel.
> >
> > Another OS component related to hetero-ISA CPUs in Linux
> > is the ELF executable file format, which is, again,
> > a software design problem, not a hardware design problem. It would be possible for ELF to be extended in
> > ways that would make binary translation (=BT) much more
> > efficient if the Linux kernel supported BT natively,
> > for example by extending/virtualizing the x86 CALL instruction to support dispatch based on the CPU _core_
> > on which the CALL instruction is currently running (example: CALL strlen; and there exists an ELF section
> > specifying which version of 'strlen' to actually call based
> > on the CPU core's capabilities). ---- Then again,
> > Linus's thoughts about what should Linux do to enable efficient use of hetero-ISA CPUs are mostly beside
> > the point and it appears that he still believes that the enablement can be implemented by some "magical
> > single-line patch" somewhere in the Linux kernel or without involving the kernel at all.
> >
> > There exist multiple options of how to approach/solve this problem. I don't
> > know which option will prevail over time, but I do know that as long as software
> > developers believe that there exist zero options then it is unsolvable.
>
>
> Without knowing how hardware designers are attempting to solve the problem, trying to provide facilities
> in software for them in advance is doomed to failure. Either you will box them in too much by making assumptions
> they would rather you not make, or you make the facility so generalized it is almost useless.
This is a "chicken and egg" problem - it'd be equally valid to claim that without knowing how software designers are attempting to solve the problem, trying to provide facilities in hardware in advance is doomed to failure.
The reality is that software designers aren't attempting to solve the problem at all; so hardware is forced to do stupid things (like disable existing AVX-512 silicon) because software gave them no other choice.
For an example; Intel's MultiProcessor Specification (my copy is version 1.4 from 1997 but I'm fairly sure it existed in prior versions) contained (emphasis mine):
"Some MP operating systems that exist today do not support processors of different types, speeds, or capabilities. However, as processor lifetimes increase and new generations of processors arrive, the potential for dissimilarity among processors increases. The MP specification addresses this potential by providing an MP configuration table to help the operating system configure itself. Operating system writers should factor in processor variations, such as processor type, family, model, and features, to arrive at a configuration that maximizes overall system performance. At a minimum, the MP operating system should remain operational and should support the common features of unequal processors."
In other words, if software developers actually bothered to follow Intel's advice 25 years ago, Intel wouldn't have had to disable AVX-512 in Alder Lake's P cores.
> Imagine if you had a single core OS and you thought "someday there will be multi core CPUs, we want
> to be ready for that" and you start adding code to make that possible without knowing how the hardware
> will implement locking. You can obviously assume this will have to be provided for in some way, so
> you write yourself some pseudocode with a generic 'lock' function to be filled in later. The problem
> is, the type of locking provided strongly influences how, where and how often it will be used.
>
> If your code assumes a very lightweight lock, you are effectively forcing the hardware guys
> to give you that even if it isn't something they can easily deliver. If you use it sparingly
> assuming it will be quite expensive, and they give you a very lightweight locking facility,
> you are not taking full advantage of what the hardware designers have given you.
This is a good analogy, but not in the way you think. For 80x86; the locking (or more specifically, the atomic instructions with the LOCK prefix, etc) existed in 8086 CPUs in the 1970s; so it would've been very easy to predict the future of 80x86's locking support relatively accurately 10 years before it become "niche server" and 20 years before it became mainstream. In the same way; it would've been easy to "predict" what software/tools need to support heterogeneous CPUs ~20 years ago when people started shoving PowerPC accelerator boards into (Motorola 68K series) Amigas or 10 years ago when ARM started doing "big.Little".
Let's push this to an extreme. GPUs are getting more "CPU like" (with better support for scalars, speculative execution, branch prediction, etc), and GPUs are increasingly becoming integrated on the same chip as CPU (with "unified" memory access, etc). At which point will people be able to use something like "pthread_create_ext()" to spawn GPGPU worker threads as a 1st class citizen in a multi-threaded process?
- Brendan
Doug S (foo.delete@this.bar.bar) on May 23, 2022 8:35 am wrote:
> ⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on May 23, 2022 6:03 am wrote:
> > Doug S (foo.delete@this.bar.bar) on May 22, 2022 8:52 pm wrote:
> > > ⚛ (0xe2.0x9a.0x9b.delete@this.gmail.com) on May 22, 2022 11:51 am wrote:
> > > > You are overly protective of what the Linux kernel currently is. There is no vision of a future
> > > > of heterogeneous CPUs in your posts .... if heterogeneous desktop/notebook CPUs are inevitable
> > > > then you should have a plan for it or make a plan for it. (An example reason why heterogeneous
> > > > CPUs are inevitable in those markets is that endowing _all_ cores in a future desktop machine
> > > > with the ability to predict 4 branches per cycle would be problematic.)
> > >
> > >
> > > That already exists, Linux handles stuff like "big cores" that are wider and "small cores" that are
> > > narrower just fine, so long as they both execute the same ISA. What Linus doesn't want to see are
> > > cores that don't support the same ISA, with certain instructions only present on some cores.
> > >
> > > If the hardware people give us that Linux will support it, but that doesn't mean he shouldn't
> > > go around telling people why he thinks that would be a bad idea. You can't really "plan"
> > > for something like this until it appears and you know what you're working with.
> >
> > I strongly disagree that it is impossible to plan for hetero-ISA multicore CPUs. The kernel is
> > [mostly] in control of the CPU, not the other way round (not CPU in control of the kernel). Thus,
> > approximately 90% of the complexity related to enabling efficient use of such CPUs is a software
> > design problem, not a hardware design problem. In other words, it is about software design (aka:
> > plan), and hardware is "just" providing execution resources for use by the kernel&apps.
> >
> > If the Linux kernel was designed/prepared to take hetero-ISA CPUs seriously, then it would already
> > be providing a clean and efficient interface between the kernel-space and the applications running
> > in user-space related to communicating/negotiating information necessary to run applications
> > on hetero-ISA CPUs efficiently. ---- By "communicating/negotiating information" I mean both directions
> > (not just one direction): kernel -> userspace and userspace -> kernel.
> >
> > In my opinion, it is a mistake to believe that hetero-ISA CPUs can be utilized efficiently without
> > a bidirectional communication channel between the OS and the applications. But unfortunately, this
> > is exactly the belief/viewpoint that Linus is advocating in relation to the Linux kernel.
> >
> > Another OS component related to hetero-ISA CPUs in Linux
> > is the ELF executable file format, which is, again,
> > a software design problem, not a hardware design problem. It would be possible for ELF to be extended in
> > ways that would make binary translation (=BT) much more
> > efficient if the Linux kernel supported BT natively,
> > for example by extending/virtualizing the x86 CALL instruction to support dispatch based on the CPU _core_
> > on which the CALL instruction is currently running (example: CALL strlen; and there exists an ELF section
> > specifying which version of 'strlen' to actually call based
> > on the CPU core's capabilities). ---- Then again,
> > Linus's thoughts about what should Linux do to enable efficient use of hetero-ISA CPUs are mostly beside
> > the point and it appears that he still believes that the enablement can be implemented by some "magical
> > single-line patch" somewhere in the Linux kernel or without involving the kernel at all.
> >
> > There exist multiple options of how to approach/solve this problem. I don't
> > know which option will prevail over time, but I do know that as long as software
> > developers believe that there exist zero options then it is unsolvable.
>
>
> Without knowing how hardware designers are attempting to solve the problem, trying to provide facilities
> in software for them in advance is doomed to failure. Either you will box them in too much by making assumptions
> they would rather you not make, or you make the facility so generalized it is almost useless.
This is a "chicken and egg" problem - it'd be equally valid to claim that without knowing how software designers are attempting to solve the problem, trying to provide facilities in hardware in advance is doomed to failure.
The reality is that software designers aren't attempting to solve the problem at all; so hardware is forced to do stupid things (like disable existing AVX-512 silicon) because software gave them no other choice.
For an example; Intel's MultiProcessor Specification (my copy is version 1.4 from 1997 but I'm fairly sure it existed in prior versions) contained (emphasis mine):
"Some MP operating systems that exist today do not support processors of different types, speeds, or capabilities. However, as processor lifetimes increase and new generations of processors arrive, the potential for dissimilarity among processors increases. The MP specification addresses this potential by providing an MP configuration table to help the operating system configure itself. Operating system writers should factor in processor variations, such as processor type, family, model, and features, to arrive at a configuration that maximizes overall system performance. At a minimum, the MP operating system should remain operational and should support the common features of unequal processors."
In other words, if software developers actually bothered to follow Intel's advice 25 years ago, Intel wouldn't have had to disable AVX-512 in Alder Lake's P cores.
> Imagine if you had a single core OS and you thought "someday there will be multi core CPUs, we want
> to be ready for that" and you start adding code to make that possible without knowing how the hardware
> will implement locking. You can obviously assume this will have to be provided for in some way, so
> you write yourself some pseudocode with a generic 'lock' function to be filled in later. The problem
> is, the type of locking provided strongly influences how, where and how often it will be used.
>
> If your code assumes a very lightweight lock, you are effectively forcing the hardware guys
> to give you that even if it isn't something they can easily deliver. If you use it sparingly
> assuming it will be quite expensive, and they give you a very lightweight locking facility,
> you are not taking full advantage of what the hardware designers have given you.
This is a good analogy, but not in the way you think. For 80x86; the locking (or more specifically, the atomic instructions with the LOCK prefix, etc) existed in 8086 CPUs in the 1970s; so it would've been very easy to predict the future of 80x86's locking support relatively accurately 10 years before it become "niche server" and 20 years before it became mainstream. In the same way; it would've been easy to "predict" what software/tools need to support heterogeneous CPUs ~20 years ago when people started shoving PowerPC accelerator boards into (Motorola 68K series) Amigas or 10 years ago when ARM started doing "big.Little".
Let's push this to an extreme. GPUs are getting more "CPU like" (with better support for scalars, speculative execution, branch prediction, etc), and GPUs are increasingly becoming integrated on the same chip as CPU (with "unified" memory access, etc). At which point will people be able to use something like "pthread_create_ext()" to spawn GPGPU worker threads as a 1st class citizen in a multi-threaded process?
- Brendan