By: rwessel (rwessel.delete@this.yahoo.com), May 21, 2022 5:05 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 21, 2022 12:06 am wrote:
> Brendan (btrotter.delete@this.gmail.com) on May 20, 2022 8:42 pm wrote:
> >
> > Just because a thread got migrated to a P core doesn't mean it has to stay there - you could migrate
> > a thread back to the E core for a while (until it uses the library again) if you want.
>
> That's the "hey, this can be fixed" thing.
>
> But the unfixable thing is much more fundamental: 'cpuid' is suddenly not reliable or meaningful.
>
> Basically, what does 'cpuid' mean?
>
> Does it mean "what are the capabilities of the CPU I happen to be running on right now"? But
> then it's useless in any system where the load can be migrated to another CPU at any time.
>
> Or does 'cpuid' mean "Ok, I now give you a set of capabilities that I guarantee"? So now any
> process that has ever run 'cpuid' will be tied to a code that matches what it was told?
>
> In other words, you are looking for an engineering solution to "oh, this core doesn't do instruction
> XYZ", but you are missing the much more fundamental issue. Intel by design has very much exposed
> that whole "query what the CPU supports" thing as a native instruction, and you cannot make
> that instruction work with sane semantics in a heterogeneous system.
>
> And 'cpuid' is not some small implementation detail. It's literally what any core system
> library would use to decide "How do I choose implement functionality XYZ?". So 'cpuid' has
> to work, and it has or be reliable and meaningful, because it's literally how people will
> make the decision on whether to use the AVX512 version of the library or not.
>
> If you claim you do AVX512, then all processes end up getting pinned to
> big cores just because some random library goes "oh, then I'll use it".
>
> And if you claim not to do AVX512, then people won't be using
> it at all, and you would be better off not having it.,
>
> And if you randomly return a value based on "right now you happen to be running on CPU X, so you do or do not
> have AVX512 based on that", you end up with random performance and the worst of both of the above worlds.
>
> And that is all assuming that the system software bent over backwards to make the whole thing work
> with auto-migration in the first place, so all these bad outcomes actually require a fair amount of
> engineering to even work at all (ok, except for the "never report AVX512" case, of course).
>
> End result: you can't win.
>
> So you're answering the wrong question entirely. The question was never "Can I auto-migrate
> a process that uses AVX512 to a big core that supports it, and maybe auto-demote it later?"
>
> No. The question was much more fundamental: 'what does cpuid report?'.
>
> And that question simply has no valid useful answer in the heterogeneous system.
>
> Ergo: the heterogeneous model is broken. Fundamentally and unfixably so.
This problem presents itself at a higher level as well. If you're running in a cluster, different machines might implement different ISA options. If there a chance your process can get migrated to a different machine in the cluster, or you might spawn a process that might run on a different machine, you get hit with not knowing what ISA you actually will be running on in the future.
IBM "fixed" Z's equivalent of CPUID to (largely) report the common ISA features across the cluster. The actual ISA features are still available (but to find them you need a hack like trying to execute an instruction, and seeing if it traps). You can also create partitions* on individual machines that are not part of a cluster containing "old" machines, and so report the "real" local features.
IBM somewhat limits the problem by only supporting "N-2" machines in a single cluster, and certainly has many fewer versions of the ISA running around in any given generation.
Clearly that's a solution with significant negative impacts if you'd like to use the new features. OTOH, from IBM's perspective, it certainly does encourage users to replace the older boxes.
*On Z, it's partitions that are the unit of clustering, not the whole machine.
> Brendan (btrotter.delete@this.gmail.com) on May 20, 2022 8:42 pm wrote:
> >
> > Just because a thread got migrated to a P core doesn't mean it has to stay there - you could migrate
> > a thread back to the E core for a while (until it uses the library again) if you want.
>
> That's the "hey, this can be fixed" thing.
>
> But the unfixable thing is much more fundamental: 'cpuid' is suddenly not reliable or meaningful.
>
> Basically, what does 'cpuid' mean?
>
> Does it mean "what are the capabilities of the CPU I happen to be running on right now"? But
> then it's useless in any system where the load can be migrated to another CPU at any time.
>
> Or does 'cpuid' mean "Ok, I now give you a set of capabilities that I guarantee"? So now any
> process that has ever run 'cpuid' will be tied to a code that matches what it was told?
>
> In other words, you are looking for an engineering solution to "oh, this core doesn't do instruction
> XYZ", but you are missing the much more fundamental issue. Intel by design has very much exposed
> that whole "query what the CPU supports" thing as a native instruction, and you cannot make
> that instruction work with sane semantics in a heterogeneous system.
>
> And 'cpuid' is not some small implementation detail. It's literally what any core system
> library would use to decide "How do I choose implement functionality XYZ?". So 'cpuid' has
> to work, and it has or be reliable and meaningful, because it's literally how people will
> make the decision on whether to use the AVX512 version of the library or not.
>
> If you claim you do AVX512, then all processes end up getting pinned to
> big cores just because some random library goes "oh, then I'll use it".
>
> And if you claim not to do AVX512, then people won't be using
> it at all, and you would be better off not having it.,
>
> And if you randomly return a value based on "right now you happen to be running on CPU X, so you do or do not
> have AVX512 based on that", you end up with random performance and the worst of both of the above worlds.
>
> And that is all assuming that the system software bent over backwards to make the whole thing work
> with auto-migration in the first place, so all these bad outcomes actually require a fair amount of
> engineering to even work at all (ok, except for the "never report AVX512" case, of course).
>
> End result: you can't win.
>
> So you're answering the wrong question entirely. The question was never "Can I auto-migrate
> a process that uses AVX512 to a big core that supports it, and maybe auto-demote it later?"
>
> No. The question was much more fundamental: 'what does cpuid report?'.
>
> And that question simply has no valid useful answer in the heterogeneous system.
>
> Ergo: the heterogeneous model is broken. Fundamentally and unfixably so.
This problem presents itself at a higher level as well. If you're running in a cluster, different machines might implement different ISA options. If there a chance your process can get migrated to a different machine in the cluster, or you might spawn a process that might run on a different machine, you get hit with not knowing what ISA you actually will be running on in the future.
IBM "fixed" Z's equivalent of CPUID to (largely) report the common ISA features across the cluster. The actual ISA features are still available (but to find them you need a hack like trying to execute an instruction, and seeing if it traps). You can also create partitions* on individual machines that are not part of a cluster containing "old" machines, and so report the "real" local features.
IBM somewhat limits the problem by only supporting "N-2" machines in a single cluster, and certainly has many fewer versions of the ISA running around in any given generation.
Clearly that's a solution with significant negative impacts if you'd like to use the new features. OTOH, from IBM's perspective, it certainly does encourage users to replace the older boxes.
*On Z, it's partitions that are the unit of clustering, not the whole machine.