By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), May 22, 2022 11:11 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 22, 2022 9:49 pm wrote:
> And exactly like Andrey tried to explain to you, the actual library function that gets used tends
> to be picked either at library load time or at first use, and then it is fixed for the lifetime of
> the whole process (and fixed across threads). That isn't the only way to do it, no, but it's by far
> the common one, and it's one of the major uses of the cpuid instruction in modern programming.
I agree this is how it's currently done (including by us). But given that we are moving into a world with more heterogeneity (whether we like it or not), isn't that an occasion to perhaps adapt what we've been doing?
Here is a proposal for lightweight but adaptable dispatch that seems reasonable, do you see any major flaws?
1) At startup the library runs existing (expensive) CPUID-based checks once per (enabled) logical processor, and stores an array of "which features can I use" bitfields.
2) The OS provides a lightweight "don't migrate me" flag, perhaps even mapped into user space to avoid kernel entry.
3) On entering the library, set the no_migrate flag, get the current logical processor index, use it to look up our bitfield, ctz() on that to get the index of the function pointer to call from our pre-baked table. After we're done with SIMD, reset the no_migrate.
> And exactly like Andrey tried to explain to you, the actual library function that gets used tends
> to be picked either at library load time or at first use, and then it is fixed for the lifetime of
> the whole process (and fixed across threads). That isn't the only way to do it, no, but it's by far
> the common one, and it's one of the major uses of the cpuid instruction in modern programming.
I agree this is how it's currently done (including by us). But given that we are moving into a world with more heterogeneity (whether we like it or not), isn't that an occasion to perhaps adapt what we've been doing?
Here is a proposal for lightweight but adaptable dispatch that seems reasonable, do you see any major flaws?
1) At startup the library runs existing (expensive) CPUID-based checks once per (enabled) logical processor, and stores an array of "which features can I use" bitfields.
2) The OS provides a lightweight "don't migrate me" flag, perhaps even mapped into user space to avoid kernel entry.
3) On entering the library, set the no_migrate flag, get the current logical processor index, use it to look up our bitfield, ctz() on that to get the index of the function pointer to call from our pre-baked table. After we're done with SIMD, reset the no_migrate.