Heterogeneous CPU Cores With OpenMP

By: --- (---.delete@this.redheron.com), February 3, 2023 12:35 pm
Room: Moderated Discussions
Anne O. Nymous (not.delete@this.real.address) on February 2, 2023 11:57 pm wrote:
> —- (-.delete@this.redheron.com) on February 2, 2023 3:35 pm wrote:
> > Mark Heath (none.delete@this.none.none) on February 1, 2023 3:45 pm wrote:
> > > Freddie (freddie.delete@this.witherden.org) on February 1, 2023 8:54 am wrote:
> > > > Also, I'll note that with static scheduling it is not necessarily true
> > > > that P cores will operate at the performance level of E cores.
> > > > So long as OMP_PROC_BIND (and related variables) are not set,
> > > > the OS scheduler is free to move threads around. Hence,
> > > > when the P cores finish (and go to sleep) the scheduler can shift a task over from the E cores to them.
> > > >
> > > > Contrived example. We have 32 work items on a system with
> > > > 4P and 4E cores with no SMT. Lets assume a P core can do
> > > > 2 items a second and an E core can do 1 item a second. With
> > > > a static schedule each core gets 32 / 8 = 4 items.
> > > >
> > > > After t = 2 seconds the P cores are done, and the E cores have
> > > > 2 items remaining. Scheduler sees this, and shifts the E core
> > > > threads to the P cores. At t = 3, we're done (maybe sooner as the
> > > > E cores being idle may allow the P cores to boost higher).
> > > > In contrast, a system with 8E cores would take until t = 4 to finish.
> > >
> > > Thank you for your interesting educational example. You’re right about static scheduling not being as
> > > bad as I said because the OS scheduler can move threads from E cores to P cores. Suppose there were 48 work
> > > items in your example. With a static schedule, each core would be assigned 48/8 = 6 work items. The P cores
> > > would finish at t=3 and each E core would have 3 work items
> > > left to do at that point. The OS scheduler would
> > > move the E core threads to the P cores and each P core would complete those 3 items at t=4.5.
> > >
> > > Now suppose the programmer used OpenMP’s auto schedule policy and the OpenMP runtime was smart enough
> > > to notice, after the E cores complete one work item, that the E cores are taking twice as long as the
> > > P cores per work item. Since the auto schedule policy allows the OpenMP runtime to figure out the best
> > > schedule, the runtime could, in theory, assign 8 work items to each P core and 4 work items to each E
> > > core. In this case, the loop would complete at t=4 instead of t=4.5. Does it seem practical for an OpenMP
> > > runtime to do this when the auto schedule policy is used? Is there any way for a programmer to manually
> > > give the P cores twice as many iterations as the E cores so the loop completes at t=4?
> > >
> > > Regarding Heikki Kultala’s comment: Apple’s hardware does not have SMT so
> > > splitting a physical P core into two virtual P cores is not possible as a way
> > > of making the performance of the all threads in the system more uniform.
> >
> > Apple, so far anyway, don’t see E cores as throughput cores but as “helper” ores, like the dedicated
> > cores on some other many core designs (like I think Fugaku does this). Apple is not substantially scaling
> > up the E cores count as the SOC grows; overall it’s a very different design thinking than Intel.
> >
> > So Apple’s answer to the OpenMP question would probably be to put the code
> > on P only and let the E cores handle whatever OS/IO work arises as they would
> > naturally, do t bother trying to squeeze out an extra few percent using them.
>
> Interesting observation.
> How much smaller are the E cores compared to the P cores and how much less power do they draw? It might be
> a question of say having 4 concurrent somewhat slower CPUs versus 1.5 faster one; for lowish priority background
> jobs higher concurrency might be more useful than higher ST speed, but I am merely speculating.
>

The numbers vary from design to design, but order of magnitude:
- E cores are a quarter the size of P cores
- E cores provide 1/4 to 1/3 the performance of P cores
- E cores use (at peak power level, which may be misleading in terms of actual usage...) about 1/10th the power (so about 1/3 the energy, taking 3x as long, for a specific task)

Essentially
- Apple optimizes their E-cores for energy-delay product (ie balanced between fast and low energy)
- ARM optimizes the E-cores for low area
- Intel optimizes their E-cores for high performance/area

Each is optimizing for a very different goal, so it's not surprising that the results are best used in very different ways.

For Apple (at least for now...) it doesn't make sense to run things like OMP, or other highly-threaded code, on E-cores unless you are chasing that last few percent of performance AND know something about your task lengths and how they balance. Certainly it might be dumb to do this when the set of tasks is variable but fairly short, each of unknown length, and with faster tasks having to wait for slower tasks.
Of course there are some trivial (frequently dick-measuring) workloads like cinebench or handbrake where this is not a risk because the tasks are so long lived before dependencies that even the simplest OS scheduler will balance everything out OK. But this is not representative of less trivially parallelizable code.

For Intel, on the other hand, E-cores represent some part of their performance future, with many kinda high-end designs of the sorts targeting gamers dumping a substantial fraction of their area and performance into E-cores, and their eco-system has a more difficult task trying to handle this...

BTW the truly energy-optimized Apple cores are the Chinook cores which are basically very fancy ARM M cores speaking AArch64. These are used as controllers all over the chip (for the GPU, NPU, ISP, etc) but are, of course, irrelevant to developers outside Apple.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
NYT on SPR---2023/01/26 10:37 AM
  NYT on SPRChris G2023/01/26 06:02 PM
    NYT on SPRme2023/01/26 07:44 PM
    NYT on SPRAnne O. Nymous2023/01/27 01:09 AM
      NYT on SPRMichael S2023/01/27 03:22 AM
      NYT on SPR---2023/01/27 10:31 AM
        Pat has been trimming the Intel product portfolioMark Roulo2023/01/27 01:29 PM
        NYT on SPRJames2023/01/27 02:00 PM
        NYT on SPRAdrian2023/01/28 03:55 AM
          NYT on SPRanonymou52023/01/28 04:03 AM
            NYT on SPRAdrian2023/01/28 04:14 AM
              NYT on SPRGroo2023/01/29 09:50 AM
            NYT on SPRGroo2023/01/29 09:46 AM
              NYT on SPRBrendan2023/01/29 01:00 PM
                NYT on SPRAnon42023/01/29 04:06 PM
                  NYT on SPRBrendan2023/01/29 07:03 PM
                  NYT on SPRGroo2023/01/30 07:09 AM
        NYT on SPRGroo2023/01/29 09:39 AM
        NYT on SPRAnonSoft2023/01/30 11:01 AM
          NYT on SPRhobold2023/01/30 12:39 PM
            NYT on SPRAnonSoft2023/01/30 05:34 PM
              NYT on SPRhobold2023/01/31 04:40 AM
              NYT on SPRJukka Larja2023/01/31 07:13 AM
                Heterogeneous CPU Cores With OpenMPMark Heath2023/02/01 04:45 AM
                  Heterogeneous CPU Cores With OpenMPFreddie2023/02/01 05:05 AM
                    Heterogeneous CPU Cores With OpenMPMark Heath2023/02/01 06:42 AM
                      Heterogeneous CPU Cores With OpenMPFreddie2023/02/01 09:54 AM
                        Heterogeneous CPU Cores With OpenMPMark Heath2023/02/01 04:45 PM
                          Heterogeneous CPU Cores With OpenMP—-2023/02/02 04:35 PM
                            Heterogeneous CPU Cores With OpenMPFreddie2023/02/02 04:39 PM
                              Heterogeneous CPU Cores With OpenMP---2023/02/03 12:15 PM
                                Heterogeneous CPU Cores With OpenMPFreddie2023/02/03 03:46 PM
                            Heterogeneous CPU Cores With OpenMPAnne O. Nymous2023/02/03 12:57 AM
                              Heterogeneous CPU Cores With OpenMP---2023/02/03 12:35 PM
                                Heterogeneous CPU Cores With OpenMPAnne O. Nymous2023/02/03 01:35 PM
                                different big/little split..Heikki Kultala2023/02/03 02:33 PM
                                Heterogeneous CPU Cores With OpenMPPaul H2023/02/03 06:51 PM
                  Heterogeneous CPU Cores With OpenMPJukka Larja2023/02/01 06:24 AM
                  When heavily loaded, Threads run about equally fast on E-cores than P-coresHeikki Kultala2023/02/01 02:08 PM
  NYT on SPRChester2023/01/27 09:30 AM
    use archive.organon2023/01/27 06:08 PM
      Bypassing paywallsDoug S2023/01/28 02:05 PM
    NYT on SPRChris G2023/01/27 06:54 PM
      Intel On DemandChris G2023/01/28 04:24 AM
        Intel On Demandme2023/01/28 06:24 AM
          Intel On DemandGroo2023/01/29 09:53 AM
        Intel On Demandrwessel2023/01/28 09:41 AM
          Intel On Demand---2023/01/28 11:37 AM
            Anit-waste biasPaul A. Clayton2023/01/28 07:57 PM
            Intel On DemandGroo2023/01/29 09:58 AM
            Intel On DemandAndrey2023/01/30 05:04 PM
          Intel On Demandblaine2023/01/28 03:07 PM
            Intel On Demandme2023/01/28 03:25 PM
              Intel On Demandme2023/01/28 03:33 PM
                Intel On DemandChris G2023/01/28 07:06 PM
                  Intel On Demandme2023/01/28 07:43 PM
                    Intel On Demand - Validation, certification?Björn Ragnar Björnsson2023/01/28 10:41 PM
                      Intel On Demand - Validation, certification?anonymou52023/01/29 02:49 AM
                        Sapphire Rapids crippleware is a naked money grabChris G2023/01/29 04:44 AM
                        Intel On Demand - Validation, certification?Groo2023/01/29 10:05 AM
                          Intel On Demand - Validation, certification?AnotherAnonymousEngineer2023/01/29 10:33 AM
                            Intel On Demand - Validation, certification?Groo2023/01/29 11:16 AM
                              Intel On Demand - Validation, certification?dmcq2023/01/29 04:32 PM
                                Intel On Demand - Validation, certification?Brendan2023/01/29 08:01 PM
                                Intel On Demand - Validation, certification?Groo2023/01/30 07:17 AM
                                  Intel On Demand - Validation, certification?Freddie2023/01/30 11:36 AM
                                  Intel On Demand - Validation, certification?anon22023/01/30 07:41 PM
                                    Intel On Demand - Validation, certification?anon22023/01/31 01:35 AM
                                      CripplewareChris G2023/01/31 05:47 AM
                                        Doctorow calls it "enshittification" (NT)hobold2023/01/31 07:55 AM
                                        Cripplewareanon22023/01/31 10:51 AM
                                          CripplewareGroo2023/02/01 02:06 PM
                                            Cripplewareanon22023/02/01 05:10 PM
                                              CripplewareChris G2023/02/01 05:52 PM
                                                Cripplewareanon22023/02/01 09:15 PM
                                                  SPR Volumeme2023/02/02 04:47 AM
                                                    SPR Volumeanon22023/02/02 07:04 AM
                                                      CripplewareChris G2023/02/02 08:12 AM
                                                        Cripplewareanon22023/02/02 08:42 AM
                                                          Cripplewareanon22023/02/02 08:48 AM
                                        CripplewareCharles2023/02/01 01:38 AM
                                          CripplewareChris G2023/02/01 02:59 AM
                                            language digressionMatt Sayler2023/02/01 04:53 PM
                                        Cripplewareme2023/02/01 06:27 PM
                                          CripplewareChris G2023/02/01 07:01 PM
                                            Cripplewareme2023/02/01 07:10 PM
                                              CripplewareChris G2023/02/01 09:32 PM
                                              CripplewareTony2023/02/01 11:18 PM
                                                Cripplewareme2023/02/02 04:27 AM
                                            Cripplewareanonymou52023/02/02 03:47 AM
                                              CripplewareChris G2023/02/02 05:59 AM
                              Intel On Demand - Enshittificationblaine2023/01/30 12:13 AM
                  Intel and mobile phonesJames2023/01/29 09:09 AM
                    Intel and mobile phonesMaxwell2023/01/29 02:25 PM
                      Intel and mobile phonesGroo2023/01/30 07:20 AM
                        Intel and mobile phonesanonymous22023/01/30 11:15 AM
                          Intel and mobile phonesDoug S2023/01/30 12:51 PM
                            Intel and mobile phonesDaniel B2023/01/31 07:37 AM
                            Intel and mobile phonesGroo2023/02/01 02:03 PM
                  SPR HBMme2023/01/29 09:17 AM
        SPR-Wme2023/02/17 05:41 PM
      Accelerators on AMD/ARMChester2023/01/29 05:41 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊