By: Andrei F (andrei.delete@this.anandtech.com), September 12, 2021 1:09 am
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on September 10, 2021 6:24 pm wrote:
> Andrei F (andrei.delete@this.anandtech.com) on September 10, 2021 1:49 pm wrote:
> > Andrey (andrey.semashev.delete@this.gmail.com) on September 10, 2021 10:38 am wrote:
> > > Andrei F (andrei.delete@this.anandtech.com) on September 10, 2021 10:31 am wrote:
> > > > inteluser (inteluser.delete@this.sharklasers.com) on September 10, 2021 2:52 am wrote:
> > > > > on alder lake, how will the separation instructions of small cores or big cores?
> > > > > will there be a dispatcher or scheduler from HW or the OS to take priority?
> > > >
> > > > The same way we've had heterogeneous cores in mobile SoCs for the better part of a decade.
> > > >
> > > > The OS scheduler just sees another core, and there's extra load and utilisation heuristics
> > > > to schedule workloads around the various cores to make best use of perf or efficiency.
> > > >
> > > > There is no hardware involved.
> > >
> > > I thought Thread Director is the hardware that is involved. It's supposed
> > > to hint the software as to where the workload is better scheduled.
> > >
> >
> > It's just a glorified microcontroller that collects performance counter data and writes into some
> > structs into memory that the OS then uses to make scheduler migration decisions. The same thing could
> > be done totally in software, though with a little more overhead due to the finer granularity.
>
> In PRINCIPLE the HW can do more. Whether you call this "scheduling" is an uninteresing question IMHO.
>
> Obviously (even pre-AMP) the HW can use the indicators it tracks to vary the DVFS
> of a CPU. I thought Intel already does this as one of the versions of Turboboost.
>
> Next you can use the indicators to dynamically vary more just CPU performance; for example code with an aggressive
> DRAM profile can have the DRAM frequency boosted to maximum even as the CPU frequency is reduced.
There is no "HW" in the proper term, it's still a software firmware just that it's running on a smaller low-power microcontroller instead of the CPU cores themselves.
DRAM is already a completely separate domain that is completely transparent to the OS in most parts and any serious SoC have had microcontroller targeted independent DVFS based on NoC/MC traffic. We've had this for years and years.
>
> Finally (and this is AMP-relevant) if a large core has a relationship with a small core
> (eg producer-consumer, ...) the small core can be appropriately boosted in speed.
> The one version of this (there are probably more) that we know of is a large core
> tracking that some fraction of its cache misses are sourced from a small core:
> https://patents.google.com/patent/US10942850B2
(This isn't related to that patent but in general)
One of the biggest issues that the traditional companies is that they have not understood power efficient DVFS. Years ago, Intel engineers lambasted schemes like big.LITTLE because it was "not hardware controlled" - but you precisely do not want ultra-fine grained DVFS like that for several reasons. In battery powered devices the whole point of DVFS was to avoid the higher performance states and voltages as much as possible, and what matters here is the delivery of performance within a unit of user experience, essentially a 16ms or 8ms frame, which is AGES. The act of frequency and voltage change itself takes up quite a bit of energy and you literally do not want to do it that fast because it actually would be more efficient to smooth out performance over the duration of your frame at a lower state, or clock/power-gate at smaller idle periods rather than to DVFS down. The same applies to things like DRAM frequency changes - you don't want to track and change this in a too fine-grained manner because you'll be wasting a ton of energy.
Related to that Apple patent, I ask if the energy investment is worth it given it's only a "fraction" of cache misses - you're not just impacting the one small core but the whole frequency and voltage domain of the small core cluster.
Alder Lake is a bit special here in that it will go both in battery powered devices as well as AC power devices - the latter is a performance scenario we haven't seen before in heterogenous CPU designs so maybe a lot of the perceptions get thrown out the window, but I have doubts that Intel will implement two completely different operation modes - we'll see.
> Andrei F (andrei.delete@this.anandtech.com) on September 10, 2021 1:49 pm wrote:
> > Andrey (andrey.semashev.delete@this.gmail.com) on September 10, 2021 10:38 am wrote:
> > > Andrei F (andrei.delete@this.anandtech.com) on September 10, 2021 10:31 am wrote:
> > > > inteluser (inteluser.delete@this.sharklasers.com) on September 10, 2021 2:52 am wrote:
> > > > > on alder lake, how will the separation instructions of small cores or big cores?
> > > > > will there be a dispatcher or scheduler from HW or the OS to take priority?
> > > >
> > > > The same way we've had heterogeneous cores in mobile SoCs for the better part of a decade.
> > > >
> > > > The OS scheduler just sees another core, and there's extra load and utilisation heuristics
> > > > to schedule workloads around the various cores to make best use of perf or efficiency.
> > > >
> > > > There is no hardware involved.
> > >
> > > I thought Thread Director is the hardware that is involved. It's supposed
> > > to hint the software as to where the workload is better scheduled.
> > >
> >
> > It's just a glorified microcontroller that collects performance counter data and writes into some
> > structs into memory that the OS then uses to make scheduler migration decisions. The same thing could
> > be done totally in software, though with a little more overhead due to the finer granularity.
>
> In PRINCIPLE the HW can do more. Whether you call this "scheduling" is an uninteresing question IMHO.
>
> Obviously (even pre-AMP) the HW can use the indicators it tracks to vary the DVFS
> of a CPU. I thought Intel already does this as one of the versions of Turboboost.
>
> Next you can use the indicators to dynamically vary more just CPU performance; for example code with an aggressive
> DRAM profile can have the DRAM frequency boosted to maximum even as the CPU frequency is reduced.
There is no "HW" in the proper term, it's still a software firmware just that it's running on a smaller low-power microcontroller instead of the CPU cores themselves.
DRAM is already a completely separate domain that is completely transparent to the OS in most parts and any serious SoC have had microcontroller targeted independent DVFS based on NoC/MC traffic. We've had this for years and years.
>
> Finally (and this is AMP-relevant) if a large core has a relationship with a small core
> (eg producer-consumer, ...) the small core can be appropriately boosted in speed.
> The one version of this (there are probably more) that we know of is a large core
> tracking that some fraction of its cache misses are sourced from a small core:
> https://patents.google.com/patent/US10942850B2
(This isn't related to that patent but in general)
One of the biggest issues that the traditional companies is that they have not understood power efficient DVFS. Years ago, Intel engineers lambasted schemes like big.LITTLE because it was "not hardware controlled" - but you precisely do not want ultra-fine grained DVFS like that for several reasons. In battery powered devices the whole point of DVFS was to avoid the higher performance states and voltages as much as possible, and what matters here is the delivery of performance within a unit of user experience, essentially a 16ms or 8ms frame, which is AGES. The act of frequency and voltage change itself takes up quite a bit of energy and you literally do not want to do it that fast because it actually would be more efficient to smooth out performance over the duration of your frame at a lower state, or clock/power-gate at smaller idle periods rather than to DVFS down. The same applies to things like DRAM frequency changes - you don't want to track and change this in a too fine-grained manner because you'll be wasting a ton of energy.
Related to that Apple patent, I ask if the energy investment is worth it given it's only a "fraction" of cache misses - you're not just impacting the one small core but the whole frequency and voltage domain of the small core cluster.
Alder Lake is a bit special here in that it will go both in battery powered devices as well as AC power devices - the latter is a performance scenario we haven't seen before in heterogenous CPU designs so maybe a lot of the perceptions get thrown out the window, but I have doubts that Intel will implement two completely different operation modes - we'll see.
Topic | Posted By | Date |
---|---|---|
alder lake. | inteluser | 2021/09/10 01:52 AM |
alder lake. | Andrei F | 2021/09/10 09:31 AM |
alder lake. | Andrey | 2021/09/10 09:38 AM |
alder lake. | rwessel | 2021/09/10 11:18 AM |
alder lake. | Andrei F | 2021/09/10 12:49 PM |
alder lake. | Andrey | 2021/09/10 04:12 PM |
alder lake. | David Hess | 2021/09/10 07:39 PM |
alder lake. | Andrey | 2021/09/11 12:28 AM |
alder lake. | --- | 2021/09/10 05:24 PM |
alder lake. | Andrei F | 2021/09/12 01:09 AM |
DVFS | David Kanter | 2021/09/12 09:58 PM |
DVFS | Andrei F | 2021/09/13 01:02 AM |
DVFS | Anon | 2021/09/13 03:28 AM |
DVFS | Jukka Larja | 2021/09/13 05:35 AM |
DVFS | Andrei F | 2021/09/14 12:07 AM |
DVFS | Jukka Larja | 2021/09/14 04:11 AM |
DVFS | Andrei F | 2021/09/14 07:55 AM |
DVFS | Jukka Larja | 2021/09/14 10:23 AM |
DVFS | --- | 2021/09/13 10:19 AM |
DVFS | Doug S | 2021/09/13 10:57 AM |
DVFS | David Hess | 2021/09/13 11:32 AM |
DVFS | --- | 2021/09/13 01:06 PM |
DVFS | David Hess | 2021/09/13 02:21 PM |
DVFS | David Kanter | 2021/09/15 03:05 PM |
DVFS | David Hess | 2021/09/13 11:46 AM |
DVFS | Jukka Larja | 2021/09/14 04:35 AM |
Quick shutdown? | David Kanter | 2021/09/15 10:46 AM |
Quick shutdown? | Andrei F | 2021/09/16 07:12 AM |
Quick shutdown? | David Kanter | 2021/09/16 11:04 AM |
Quick shutdown? | Andrei F | 2021/09/17 01:35 AM |
Quick shutdown? | Andrei F | 2021/09/17 01:38 AM |
and weren't 'they' right? | Daniel B | 2021/09/13 04:20 AM |
and weren't 'they' right? | Andrei F | 2021/09/13 04:51 AM |
and weren't 'they' right? | Daniel B | 2021/09/13 06:29 AM |
and weren't 'they' right? | anon | 2021/09/13 05:07 AM |
and weren't 'they' right? | Jukka Larja | 2021/09/13 05:26 AM |
and weren't 'they' right? | anon | 2021/09/13 11:37 PM |
Alder Lake has no little cores | Heikki Kultala | 2021/09/13 06:33 AM |
Alder Lake has no little cores | Michael S | 2021/09/13 07:33 AM |
Alder Lake has no little cores | me | 2021/09/13 10:45 AM |
Alder Lake has no little cores | Heikki Kultala | 2021/09/13 01:49 PM |
Alder Lake has no little cores | anon | 2021/09/13 11:42 PM |
why stop at two core sizes? | hobold | 2021/09/14 05:47 AM |
Memory caches did this, right? | Mark Roulo | 2021/09/14 02:51 PM |
Memory caches did this, right? | Brett | 2021/09/14 07:17 PM |
Memory caches did this, right? | Kevin G | 2021/09/16 03:10 PM |
Large reorder buffers (L1+L2) | ⚛ | 2021/09/15 11:24 AM |
Large reorder buffers (L1+L2) | hobold | 2021/09/15 12:06 PM |
Alder Lake has no little cores | Adrian | 2021/09/14 08:33 AM |
and weren't 'they' right? | David Hess | 2021/09/13 12:00 PM |
Battery vs Performance | Mark Roulo | 2021/09/13 12:18 PM |
Battery vs Performance | Doug S | 2021/09/13 02:05 PM |
Battery vs Performance | David Hess | 2021/09/13 02:28 PM |
Battery vs Performance | --- | 2021/09/13 05:08 PM |
Battery vs Performance | --- | 2021/09/13 05:08 PM |
Battery vs Performance | Doug S | 2021/09/13 08:53 PM |
Battery vs Performance | Anon | 2021/09/14 06:42 AM |
and weren't 'they' right? | Daniel B | 2021/09/13 12:57 PM |
and weren't 'they' right? | David Hess | 2021/09/13 02:11 PM |
and weren't 'they' right? | --- | 2021/09/13 02:38 PM |
and weren't 'they' right? | --- | 2021/09/13 02:32 PM |
and weren't 'they' right? | Brendan | 2021/09/14 03:30 AM |
and weren't 'they' right? | Jukka Larja | 2021/09/14 04:31 AM |
and weren't 'they' right? | Etienne Lorrain | 2021/09/14 12:29 AM |