By: Jörn Engel (joern.delete@this.purestorage.com), May 22, 2022 11:51 pm
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 22, 2022 11:11 pm wrote:
>
> Here is a proposal for lightweight but adaptable dispatch that seems reasonable, do you see any major flaws?
> 1) At startup the library runs existing (expensive) CPUID-based checks once per (enabled)
> logical processor, and stores an array of "which features can I use" bitfields.
> 2) The OS provides a lightweight "don't migrate me" flag, perhaps
> even mapped into user space to avoid kernel entry.
> 3) On entering the library, set the no_migrate flag, get the current logical processor index,
> use it to look up our bitfield, ctz() on that to get the index of the function pointer to
> call from our pre-baked table. After we're done with SIMD, reset the no_migrate.
That will introduce high latency. Like it or not, bad device drivers with 1s busy-loops are common. Those events are rare for any particular machine, but quite common for a large population. And fixing all the bugs is both hard and somewhat futile. Next year there will be a new device driver that has recycled all the bugs you just fixed.
If you can migrate all other threads off the affected CPU, the problem isn't too bad. But threads that cannot migrate will have high latency. And this isn't an application programmer pinning threads and suffering the consequences, this is an application programmer suffering the bad choices made by some library.
You would need a "don't migrate me to a different kind of CPU" flag or something like that, broad enough to allow migrations in general, but specific enough to avoid CPUs with different CPUID flags. And now you have to be very careful how you define things.
I'm with Linus on this one. Having fast and slow cores is mostly fine. But having cores with fundamentally different behavior is too much pain to be worth it, with embedded systems as a possible exception.
>
> Here is a proposal for lightweight but adaptable dispatch that seems reasonable, do you see any major flaws?
> 1) At startup the library runs existing (expensive) CPUID-based checks once per (enabled)
> logical processor, and stores an array of "which features can I use" bitfields.
> 2) The OS provides a lightweight "don't migrate me" flag, perhaps
> even mapped into user space to avoid kernel entry.
> 3) On entering the library, set the no_migrate flag, get the current logical processor index,
> use it to look up our bitfield, ctz() on that to get the index of the function pointer to
> call from our pre-baked table. After we're done with SIMD, reset the no_migrate.
That will introduce high latency. Like it or not, bad device drivers with 1s busy-loops are common. Those events are rare for any particular machine, but quite common for a large population. And fixing all the bugs is both hard and somewhat futile. Next year there will be a new device driver that has recycled all the bugs you just fixed.
If you can migrate all other threads off the affected CPU, the problem isn't too bad. But threads that cannot migrate will have high latency. And this isn't an application programmer pinning threads and suffering the consequences, this is an application programmer suffering the bad choices made by some library.
You would need a "don't migrate me to a different kind of CPU" flag or something like that, broad enough to allow migrations in general, but specific enough to avoid CPUs with different CPUID flags. And now you have to be very careful how you define things.
I'm with Linus on this one. Having fast and slow cores is mostly fine. But having cores with fundamentally different behavior is too much pain to be worth it, with embedded systems as a possible exception.