By: --- (---.delete@this.redheron.com), September 13, 2021 10:19 am
Room: Moderated Discussions
Andrei F (andrei.delete@this.anandtech.com) on September 13, 2021 2:02 am wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on September 12, 2021 10:58 pm wrote:
> > > One of the biggest issues that the traditional companies is that they have not understood power efficient
> > > DVFS. Years ago, Intel engineers lambasted schemes like big.LITTLE because it was "not hardware controlled"
> > > - but you precisely do not want ultra-fine grained DVFS like that for several reasons.
> >
> > >In battery powered
> > > devices the whole point of DVFS was to avoid the higher
> > > performance states and voltages as much as possible,
> > > and what matters here is the delivery of performance within a unit of user experience, essentially a 16ms
> > > or 8ms frame, which is AGES.
> >
> > Isn't the point to deliver max perf with min energy?
> >
> > >The act of frequency and voltage change itself takes up quite a bit of energy
> > > and you literally do not want to do it that fast because it actually would be more efficient to smooth out
> > > performance over the duration of your frame at a lower state, or clock/power-gate at smaller idle periods
> > > rather than to DVFS down.
> >
> > That depends on several things:
> >
> > 1. Latency of adjusting voltage
> > 2. Latency of adjusting the clock
> > 3. Penalties associated with changing V or F
> >
> > For a system with a 120MHz FIVR, you can adjust voltage pretty quickly compared to that 8ms period.
> >
> > It is quite possible to change clocks in a small number of cycles, depending on your clocking architecture.
> >
> > Again - if you look at an 8ms period, that's 24M clock cycles.
> > I think burning around 1-2% of those on voltage
> > and frequency transition shouldn't be an issue compared to possible gains, although this is a guess.
> >
> > One issue is that I suspect many designs impose long penalties for
> > voltage/clock transitions. But that's a choice, not a limitation.
> >
> > David
> > Isn't the point to deliver max perf with min energy?
> The point is that your window of user experience is 16/8ms. If the workload completes
> interactively to fill that "QoS" at the current frequency without going over a utilisation
> threshold in that sliding window, you *do not want to go any higher*.
> So your 120MHz FIVR is completely and utterly pointless. You would be wasting energy at higher
> voltages for no gain in user performance. Fmax should only every be reached and triggered after
> continuous load of 2-3x of user experience window - current mobile phones do that in around 40-50ms,
> anything faster than that is waste of energy and battery life. These are not HW limitations, but
> learning the hard way what the most efficient way to design battery powered DVFS logic.

I'm not sure what you are arguing, Andrei.
Is your claim
- that a mobile core CANNOT (or at least cannot rapidly) substantially changes its DVFS OR
- that a mobile core should not WANT TO rapidly change its DVFS OR
- that a mobile core DOES NOT KNOW if it should rapidly change its DVFS?

We know that you have graphs like
and I assume you are arguing from those graphs, but I don't know what you are arguing for or against.

I think we can agree that *frequent* changes in DVFS are probably undesirable (lower performance at higher energy) and you seem to be arguing against that. And Apple has hysteresis all over the place in the CPU design to prevent such rapid back and forth transitions, so I would expect they likewise want to prevent it in DVFS scheduling.
But that's different from *rapid* changes in DVFS if the OS/CPU has reason to believe a phase change has occurred.

In particular this whole discussion seems based on an idea of "what can be done autonomously, without API's describing exactly what code wants". And, especially for Intel, I guess that's important. But it's not the whole story.
Apple have (and others could add) multiple API's to improve the situation. These are not just QoS API's and heuristics (though those are a good start), but API's with the general structure that
- a thread knows the task it wants to perform
- knows a deadline time
- is able to provide a non-useless estimate of the fraction of the task performed.
Putting these together the OS (informed by the HW) can keep slightly shifting DVFS to run at the slowest possible rate while still hitting the target.

Some points that may not be captured by your curve:
- does that curve reflect "non-informed" compute? ie maybe the code is wrapped in a QoS wrapper, but it's not telling the OS of a deadline and a fraction of task completed?
If those API's were used, we might see just how fast the CPU/OS is willing to ramp up and down, as opposed to the choices made in the absence of better info.

- one thing Apple do (I don't know about other designs) is slide voltage up and down while keeping frequency unchanged. Obviously changing frequency is somewhat disruptive in a way that changing voltage is not.
Apple have some flexibility to do this given the digital power estimators, a knowledge of the capacitance in the system, and an ability to prevent catastrophe if the system is oversubscribed by having instruction issue paused by the DPE for a cycle or two. Maybe also the fact that every SRAM is decoupled from logic with a voltage shifter between the two, so you have flexibility to down-voltage logic while not losing SRAM retention.
(At least they have a long sequence of patents about this, so you'd hope it's implemented!)

In other words the game is not *just* about large changes in DVFS that may be forced to occur over multiple screen refreshes; it's also about small more common tweaks in frequency about a stable "sorta average" frequency, and about small very frequent tweaks in voltage at a particular stable frequency.

Of course these are more "per cluster" issues, not "cross-cluster" interaction. So maybe they are not what you had in mind. But I honestly can't tell quite what's being argued here, so I thought there's some value in pointing out what's already possible and being done.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
alder lake.inteluser2021/09/10 01:52 AM
  alder lake.Andrei F2021/09/10 09:31 AM
    alder lake.Andrey2021/09/10 09:38 AM
      alder lake.rwessel2021/09/10 11:18 AM
      alder lake.Andrei F2021/09/10 12:49 PM
        alder lake.Andrey2021/09/10 04:12 PM
          alder lake.David Hess2021/09/10 07:39 PM
            alder lake.Andrey2021/09/11 12:28 AM
        alder lake.---2021/09/10 05:24 PM
          alder lake.Andrei F2021/09/12 01:09 AM
            DVFSDavid Kanter2021/09/12 09:58 PM
              DVFSAndrei F2021/09/13 01:02 AM
                DVFSAnon2021/09/13 03:28 AM
                DVFSJukka Larja2021/09/13 05:35 AM
                  DVFSAndrei F2021/09/14 12:07 AM
                    DVFSJukka Larja2021/09/14 04:11 AM
                      DVFSAndrei F2021/09/14 07:55 AM
                        DVFSJukka Larja2021/09/14 10:23 AM
                DVFS---2021/09/13 10:19 AM
                  DVFSDoug S2021/09/13 10:57 AM
                    DVFSDavid Hess2021/09/13 11:32 AM
                    DVFS---2021/09/13 01:06 PM
                      DVFSDavid Hess2021/09/13 02:21 PM
                    DVFSDavid Kanter2021/09/15 03:05 PM
                  DVFSDavid Hess2021/09/13 11:46 AM
                  DVFSJukka Larja2021/09/14 04:35 AM
                Quick shutdown?David Kanter2021/09/15 10:46 AM
                  Quick shutdown?Andrei F2021/09/16 07:12 AM
                    Quick shutdown?David Kanter2021/09/16 11:04 AM
                      Quick shutdown?Andrei F2021/09/17 01:35 AM
                        Quick shutdown?Andrei F2021/09/17 01:38 AM
            and weren't 'they' right?Daniel B2021/09/13 04:20 AM
              and weren't 'they' right?Andrei F2021/09/13 04:51 AM
                and weren't 'they' right?Daniel B2021/09/13 06:29 AM
              and weren't 'they' right?anon2021/09/13 05:07 AM
                and weren't 'they' right?Jukka Larja2021/09/13 05:26 AM
                  and weren't 'they' right?anon2021/09/13 11:37 PM
              Alder Lake has no little coresHeikki Kultala2021/09/13 06:33 AM
                Alder Lake has no little coresMichael S2021/09/13 07:33 AM
                  Alder Lake has no little coresme2021/09/13 10:45 AM
                  Alder Lake has no little coresHeikki Kultala2021/09/13 01:49 PM
                    Alder Lake has no little coresanon2021/09/13 11:42 PM
                why stop at two core sizes?hobold2021/09/14 05:47 AM
                  Memory caches did this, right?Mark Roulo2021/09/14 02:51 PM
                    Memory caches did this, right?Brett2021/09/14 07:17 PM
                      Memory caches did this, right?Kevin G2021/09/16 03:10 PM
                  Large reorder buffers (L1+L2)2021/09/15 11:24 AM
                    Large reorder buffers (L1+L2)hobold2021/09/15 12:06 PM
                Alder Lake has no little coresAdrian2021/09/14 08:33 AM
              and weren't 'they' right?David Hess2021/09/13 12:00 PM
                Battery vs PerformanceMark Roulo2021/09/13 12:18 PM
                  Battery vs PerformanceDoug S2021/09/13 02:05 PM
                    Battery vs PerformanceDavid Hess2021/09/13 02:28 PM
                      Battery vs Performance---2021/09/13 05:08 PM
                      Battery vs Performance---2021/09/13 05:08 PM
                      Battery vs PerformanceDoug S2021/09/13 08:53 PM
                    Battery vs PerformanceAnon2021/09/14 06:42 AM
                and weren't 'they' right?Daniel B2021/09/13 12:57 PM
                  and weren't 'they' right?David Hess2021/09/13 02:11 PM
                    and weren't 'they' right?---2021/09/13 02:38 PM
                  and weren't 'they' right?---2021/09/13 02:32 PM
                and weren't 'they' right?Brendan2021/09/14 03:30 AM
                  and weren't 'they' right?Jukka Larja2021/09/14 04:31 AM
              and weren't 'they' right?Etienne Lorrain2021/09/14 12:29 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊