By: --- (---.delete@this.redheron.com), September 13, 2021 2:32 pm
Room: Moderated Discussions
Daniel B (fejenagy.delete@this.gmail.com) on September 13, 2021 1:57 pm wrote:
> David Hess (davidwhess.delete@this.gmail.com) on September 13, 2021 1:00 pm wrote:
> > Daniel B (fejenagy.delete@this.gmail.com) on September 13, 2021 5:20 am wrote:
> > >
> > > Either the scheduler needs to learn the application behaviour
> > > or the application should come tagged, but then the ISV
> > > is expected to understand hardware and system energy
> > > efficiency, which they don't.
> >
> > Is there any reason not to expect applications to lie? Of course every
> > thread needs to operate on the fastest available core. Ask them.
> >
> > Alternatively why would developers spend time to distinguish which threads should operate on which cores
> > when the result can only make the perceived performance worse? And why risk getting it wrong?
> >
>
> I have no answer to this. Even with good intentions, I don't see how it is possible to know which application
> should run where at any given time without deep performance analysis and an understanding of what the target
> QoS is. I tried a quick Google search, but not much comes up. Qualcomm says they support affinity settings,
> but in the end it all depends on the scheduler. Nothing on Apple iOS so far. Some developers asked the
> question of pinning threads to specific cores but so far the answer seems to be nope. I suspect (most)
> applications do not set this and it is all magic happening in the scheduler black box.
Here's the Apple API's:
https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html
Unfortunately the deadline scheduling stuff (the concept you need is "Work Interval Object") like so much produced by Apple (and MS and Google...) these days is some combination of terribly documented and half-way public half-way private.
I only see one public API that documents it (for manipulating audio) but the generic idea can be read about here:
https://opensource.apple.com/source/xnu/xnu-4570.41.2/bsd/sys/work_interval.h.auto.html
Many of these concepts I mention are more Darwin than iOS/macOS concepts; the expectation, for better or worse, seems to be that either you use them at the Darwin level (with all the fun that implies in terms of reading header files and man pages to figure out how it all works) or you use the higher level AppleOS primitives, and just accept what they give you in terms of scheduling.
The basic Apple patent describing how to use much of the Thread Director equivalent info is here:
https://patents.google.com/patent/US20170357302A1/en
That's the easy one! The much more complicated one is a year later, here:
https://patents.google.com/patent/US10884811B2
with subsidiary patents like
https://patents.google.com/patent/US20180293102A1/en
These describe how the OS uses the low level HW info, the API and OS-internal concepts it uses, and how to deal with the really big problem (the one you are all ignoring) -- optimal clustering of threads into a thread scheduling group given the constraint (I think this is the governing concern) that all items in a CPU cluster (cores and L2) have to run at the same frequency.
Honestly I can only pick out big ideas in the AMP Scheduler patent; I don't know enough about OS internals to really understand it, but others might learn from it.
> David Hess (davidwhess.delete@this.gmail.com) on September 13, 2021 1:00 pm wrote:
> > Daniel B (fejenagy.delete@this.gmail.com) on September 13, 2021 5:20 am wrote:
> > >
> > > Either the scheduler needs to learn the application behaviour
> > > or the application should come tagged, but then the ISV
> > > is expected to understand hardware and system energy
> > > efficiency, which they don't.
> >
> > Is there any reason not to expect applications to lie? Of course every
> > thread needs to operate on the fastest available core. Ask them.
> >
> > Alternatively why would developers spend time to distinguish which threads should operate on which cores
> > when the result can only make the perceived performance worse? And why risk getting it wrong?
> >
>
> I have no answer to this. Even with good intentions, I don't see how it is possible to know which application
> should run where at any given time without deep performance analysis and an understanding of what the target
> QoS is. I tried a quick Google search, but not much comes up. Qualcomm says they support affinity settings,
> but in the end it all depends on the scheduler. Nothing on Apple iOS so far. Some developers asked the
> question of pinning threads to specific cores but so far the answer seems to be nope. I suspect (most)
> applications do not set this and it is all magic happening in the scheduler black box.
Here's the Apple API's:
https://developer.apple.com/library/archive/documentation/Performance/Conceptual/power_efficiency_guidelines_osx/PrioritizeWorkAtTheTaskLevel.html
Unfortunately the deadline scheduling stuff (the concept you need is "Work Interval Object") like so much produced by Apple (and MS and Google...) these days is some combination of terribly documented and half-way public half-way private.
I only see one public API that documents it (for manipulating audio) but the generic idea can be read about here:
https://opensource.apple.com/source/xnu/xnu-4570.41.2/bsd/sys/work_interval.h.auto.html
Many of these concepts I mention are more Darwin than iOS/macOS concepts; the expectation, for better or worse, seems to be that either you use them at the Darwin level (with all the fun that implies in terms of reading header files and man pages to figure out how it all works) or you use the higher level AppleOS primitives, and just accept what they give you in terms of scheduling.
The basic Apple patent describing how to use much of the Thread Director equivalent info is here:
https://patents.google.com/patent/US20170357302A1/en
That's the easy one! The much more complicated one is a year later, here:
https://patents.google.com/patent/US10884811B2
with subsidiary patents like
https://patents.google.com/patent/US20180293102A1/en
These describe how the OS uses the low level HW info, the API and OS-internal concepts it uses, and how to deal with the really big problem (the one you are all ignoring) -- optimal clustering of threads into a thread scheduling group given the constraint (I think this is the governing concern) that all items in a CPU cluster (cores and L2) have to run at the same frequency.
Honestly I can only pick out big ideas in the AMP Scheduler patent; I don't know enough about OS internals to really understand it, but others might learn from it.
Topic | Posted By | Date |
---|---|---|
alder lake. | inteluser | 2021/09/10 01:52 AM |
alder lake. | Andrei F | 2021/09/10 09:31 AM |
alder lake. | Andrey | 2021/09/10 09:38 AM |
alder lake. | rwessel | 2021/09/10 11:18 AM |
alder lake. | Andrei F | 2021/09/10 12:49 PM |
alder lake. | Andrey | 2021/09/10 04:12 PM |
alder lake. | David Hess | 2021/09/10 07:39 PM |
alder lake. | Andrey | 2021/09/11 12:28 AM |
alder lake. | --- | 2021/09/10 05:24 PM |
alder lake. | Andrei F | 2021/09/12 01:09 AM |
DVFS | David Kanter | 2021/09/12 09:58 PM |
DVFS | Andrei F | 2021/09/13 01:02 AM |
DVFS | Anon | 2021/09/13 03:28 AM |
DVFS | Jukka Larja | 2021/09/13 05:35 AM |
DVFS | Andrei F | 2021/09/14 12:07 AM |
DVFS | Jukka Larja | 2021/09/14 04:11 AM |
DVFS | Andrei F | 2021/09/14 07:55 AM |
DVFS | Jukka Larja | 2021/09/14 10:23 AM |
DVFS | --- | 2021/09/13 10:19 AM |
DVFS | Doug S | 2021/09/13 10:57 AM |
DVFS | David Hess | 2021/09/13 11:32 AM |
DVFS | --- | 2021/09/13 01:06 PM |
DVFS | David Hess | 2021/09/13 02:21 PM |
DVFS | David Kanter | 2021/09/15 03:05 PM |
DVFS | David Hess | 2021/09/13 11:46 AM |
DVFS | Jukka Larja | 2021/09/14 04:35 AM |
Quick shutdown? | David Kanter | 2021/09/15 10:46 AM |
Quick shutdown? | Andrei F | 2021/09/16 07:12 AM |
Quick shutdown? | David Kanter | 2021/09/16 11:04 AM |
Quick shutdown? | Andrei F | 2021/09/17 01:35 AM |
Quick shutdown? | Andrei F | 2021/09/17 01:38 AM |
and weren't 'they' right? | Daniel B | 2021/09/13 04:20 AM |
and weren't 'they' right? | Andrei F | 2021/09/13 04:51 AM |
and weren't 'they' right? | Daniel B | 2021/09/13 06:29 AM |
and weren't 'they' right? | anon | 2021/09/13 05:07 AM |
and weren't 'they' right? | Jukka Larja | 2021/09/13 05:26 AM |
and weren't 'they' right? | anon | 2021/09/13 11:37 PM |
Alder Lake has no little cores | Heikki Kultala | 2021/09/13 06:33 AM |
Alder Lake has no little cores | Michael S | 2021/09/13 07:33 AM |
Alder Lake has no little cores | me | 2021/09/13 10:45 AM |
Alder Lake has no little cores | Heikki Kultala | 2021/09/13 01:49 PM |
Alder Lake has no little cores | anon | 2021/09/13 11:42 PM |
why stop at two core sizes? | hobold | 2021/09/14 05:47 AM |
Memory caches did this, right? | Mark Roulo | 2021/09/14 02:51 PM |
Memory caches did this, right? | Brett | 2021/09/14 07:17 PM |
Memory caches did this, right? | Kevin G | 2021/09/16 03:10 PM |
Large reorder buffers (L1+L2) | ⚛ | 2021/09/15 11:24 AM |
Large reorder buffers (L1+L2) | hobold | 2021/09/15 12:06 PM |
Alder Lake has no little cores | Adrian | 2021/09/14 08:33 AM |
and weren't 'they' right? | David Hess | 2021/09/13 12:00 PM |
Battery vs Performance | Mark Roulo | 2021/09/13 12:18 PM |
Battery vs Performance | Doug S | 2021/09/13 02:05 PM |
Battery vs Performance | David Hess | 2021/09/13 02:28 PM |
Battery vs Performance | --- | 2021/09/13 05:08 PM |
Battery vs Performance | --- | 2021/09/13 05:08 PM |
Battery vs Performance | Doug S | 2021/09/13 08:53 PM |
Battery vs Performance | Anon | 2021/09/14 06:42 AM |
and weren't 'they' right? | Daniel B | 2021/09/13 12:57 PM |
and weren't 'they' right? | David Hess | 2021/09/13 02:11 PM |
and weren't 'they' right? | --- | 2021/09/13 02:38 PM |
and weren't 'they' right? | --- | 2021/09/13 02:32 PM |
and weren't 'they' right? | Brendan | 2021/09/14 03:30 AM |
and weren't 'they' right? | Jukka Larja | 2021/09/14 04:31 AM |
and weren't 'they' right? | Etienne Lorrain | 2021/09/14 12:29 AM |