Heterogeneous CPU Cores With OpenMP

By: Anne O. Nymous (not.delete@this.real.address), February 3, 2023 1:35 pm
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on February 3, 2023 11:35 am wrote:
> Anne O. Nymous (not.delete@this.real.address) on February 2, 2023 11:57 pm wrote:
> > —- (-.delete@this.redheron.com) on February 2, 2023 3:35 pm wrote:
> > > Mark Heath (none.delete@this.none.none) on February 1, 2023 3:45 pm wrote:
> > > > Freddie (freddie.delete@this.witherden.org) on February 1, 2023 8:54 am wrote:
> > > > > Also, I'll note that with static scheduling it is not necessarily true
> > > > > that P cores will operate at the performance level of E cores.
> > > > > So long as OMP_PROC_BIND (and related variables) are not set,
> > > > > the OS scheduler is free to move threads around. Hence,
> > > > > when the P cores finish (and go to sleep) the scheduler can shift a task over from the E cores to them.
> > > > >
> > > > > Contrived example. We have 32 work items on a system with
> > > > > 4P and 4E cores with no SMT. Lets assume a P core can do
> > > > > 2 items a second and an E core can do 1 item a second. With
> > > > > a static schedule each core gets 32 / 8 = 4 items.
> > > > >
> > > > > After t = 2 seconds the P cores are done, and the E cores have
> > > > > 2 items remaining. Scheduler sees this, and shifts the E core
> > > > > threads to the P cores. At t = 3, we're done (maybe sooner as the
> > > > > E cores being idle may allow the P cores to boost higher).
> > > > > In contrast, a system with 8E cores would take until t = 4 to finish.
> > > >
> > > > Thank you for your interesting educational example. You’re right about static scheduling not being as
> > > > bad as I said because the OS scheduler can move threads from E cores to P cores. Suppose there were 48 work
> > > > items in your example. With a static schedule, each core would be assigned 48/8 = 6 work items. The P cores
> > > > would finish at t=3 and each E core would have 3 work items
> > > > left to do at that point. The OS scheduler would
> > > > move the E core threads to the P cores and each P core would complete those 3 items at t=4.5.
> > > >
> > > > Now suppose the programmer used OpenMP’s auto schedule policy and the OpenMP runtime was smart enough
> > > > to notice, after the E cores complete one work item, that the E cores are taking twice as long as the
> > > > P cores per work item. Since the auto schedule policy allows the OpenMP runtime to figure out the best
> > > > schedule, the runtime could, in theory, assign 8 work items to each P core and 4 work items to each E
> > > > core. In this case, the loop would complete at t=4 instead of t=4.5. Does it seem practical for an OpenMP
> > > > runtime to do this when the auto schedule policy is used? Is there any way for a programmer to manually
> > > > give the P cores twice as many iterations as the E cores so the loop completes at t=4?
> > > >
> > > > Regarding Heikki Kultala’s comment: Apple’s hardware does not have SMT so
> > > > splitting a physical P core into two virtual P cores is not possible as a way
> > > > of making the performance of the all threads in the system more uniform.
> > >
> > > Apple, so far anyway, don’t see E cores as throughput cores but as “helper” ores, like the dedicated
> > > cores on some other many core designs (like I think Fugaku does this). Apple is not substantially scaling
> > > up the E cores count as the SOC grows; overall it’s a very different design thinking than Intel.
> > >
> > > So Apple’s answer to the OpenMP question would probably be to put the code
> > > on P only and let the E cores handle whatever OS/IO work arises as they would
> > > naturally, do t bother trying to squeeze out an extra few percent using them.
> >
> > Interesting observation.
> > How much smaller are the E cores compared to the P cores and how much less power do they draw? It might be
> > a question of say having 4 concurrent somewhat slower CPUs
> > versus 1.5 faster one; for lowish priority background
> > jobs higher concurrency might be more useful than higher ST speed, but I am merely speculating.
> >
>
> The numbers vary from design to design, but order of magnitude:
> - E cores are a quarter the size of P cores
> - E cores provide 1/4 to 1/3 the performance of P cores
> - E cores use (at peak power level, which may be misleading in terms of actual usage...)
> about 1/10th the power (so about 1/3 the energy, taking 3x as long, for a specific task)
>
> Essentially
> - Apple optimizes their E-cores for energy-delay product (ie balanced between fast and low energy)
> - ARM optimizes the E-cores for low area
> - Intel optimizes their E-cores for high performance/area
>
> Each is optimizing for a very different goal, so it's not surprising
> that the results are best used in very different ways.
>
> For Apple (at least for now...) it doesn't make sense to run things like OMP, or other highly-threaded code,
> on E-cores unless you are chasing that last few percent of performance AND know something about your task
> lengths and how they balance. Certainly it might be dumb to do this when the set of tasks is variable but
> fairly short, each of unknown length, and with faster tasks having to wait for slower tasks.
> Of course there are some trivial (frequently dick-measuring) workloads like cinebench or handbrake where this
> is not a risk because the tasks are so long lived before dependencies that even the simplest OS scheduler
> will balance everything out OK. But this is not representative of less trivially parallelizable code.
>
> For Intel, on the other hand, E-cores represent some part of their performance future, with many kinda
> high-end designs of the sorts targeting gamers dumping a substantial fraction of their area and performance
> into E-cores, and their eco-system has a more difficult task trying to handle this...
>
> BTW the truly energy-optimized Apple cores are the Chinook cores which are basically very
> fancy ARM M cores speaking AArch64. These are used as controllers all over the chip (for
> the GPU, NPU, ISP, etc) but are, of course, irrelevant to developers outside Apple.
>

Thanks! Glad I asked, more food for thought.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
NYT on SPR---2023/01/26 10:37 AM
  NYT on SPRChris G2023/01/26 06:02 PM
    NYT on SPRme2023/01/26 07:44 PM
    NYT on SPRAnne O. Nymous2023/01/27 01:09 AM
      NYT on SPRMichael S2023/01/27 03:22 AM
      NYT on SPR---2023/01/27 10:31 AM
        Pat has been trimming the Intel product portfolioMark Roulo2023/01/27 01:29 PM
        NYT on SPRJames2023/01/27 02:00 PM
        NYT on SPRAdrian2023/01/28 03:55 AM
          NYT on SPRanonymou52023/01/28 04:03 AM
            NYT on SPRAdrian2023/01/28 04:14 AM
              NYT on SPRGroo2023/01/29 09:50 AM
            NYT on SPRGroo2023/01/29 09:46 AM
              NYT on SPRBrendan2023/01/29 01:00 PM
                NYT on SPRAnon42023/01/29 04:06 PM
                  NYT on SPRBrendan2023/01/29 07:03 PM
                  NYT on SPRGroo2023/01/30 07:09 AM
        NYT on SPRGroo2023/01/29 09:39 AM
        NYT on SPRAnonSoft2023/01/30 11:01 AM
          NYT on SPRhobold2023/01/30 12:39 PM
            NYT on SPRAnonSoft2023/01/30 05:34 PM
              NYT on SPRhobold2023/01/31 04:40 AM
              NYT on SPRJukka Larja2023/01/31 07:13 AM
                Heterogeneous CPU Cores With OpenMPMark Heath2023/02/01 04:45 AM
                  Heterogeneous CPU Cores With OpenMPFreddie2023/02/01 05:05 AM
                    Heterogeneous CPU Cores With OpenMPMark Heath2023/02/01 06:42 AM
                      Heterogeneous CPU Cores With OpenMPFreddie2023/02/01 09:54 AM
                        Heterogeneous CPU Cores With OpenMPMark Heath2023/02/01 04:45 PM
                          Heterogeneous CPU Cores With OpenMP—-2023/02/02 04:35 PM
                            Heterogeneous CPU Cores With OpenMPFreddie2023/02/02 04:39 PM
                              Heterogeneous CPU Cores With OpenMP---2023/02/03 12:15 PM
                                Heterogeneous CPU Cores With OpenMPFreddie2023/02/03 03:46 PM
                            Heterogeneous CPU Cores With OpenMPAnne O. Nymous2023/02/03 12:57 AM
                              Heterogeneous CPU Cores With OpenMP---2023/02/03 12:35 PM
                                Heterogeneous CPU Cores With OpenMPAnne O. Nymous2023/02/03 01:35 PM
                                different big/little split..Heikki Kultala2023/02/03 02:33 PM
                                Heterogeneous CPU Cores With OpenMPPaul H2023/02/03 06:51 PM
                  Heterogeneous CPU Cores With OpenMPJukka Larja2023/02/01 06:24 AM
                  When heavily loaded, Threads run about equally fast on E-cores than P-coresHeikki Kultala2023/02/01 02:08 PM
  NYT on SPRChester2023/01/27 09:30 AM
    use archive.organon2023/01/27 06:08 PM
      Bypassing paywallsDoug S2023/01/28 02:05 PM
    NYT on SPRChris G2023/01/27 06:54 PM
      Intel On DemandChris G2023/01/28 04:24 AM
        Intel On Demandme2023/01/28 06:24 AM
          Intel On DemandGroo2023/01/29 09:53 AM
        Intel On Demandrwessel2023/01/28 09:41 AM
          Intel On Demand---2023/01/28 11:37 AM
            Anit-waste biasPaul A. Clayton2023/01/28 07:57 PM
            Intel On DemandGroo2023/01/29 09:58 AM
            Intel On DemandAndrey2023/01/30 05:04 PM
          Intel On Demandblaine2023/01/28 03:07 PM
            Intel On Demandme2023/01/28 03:25 PM
              Intel On Demandme2023/01/28 03:33 PM
                Intel On DemandChris G2023/01/28 07:06 PM
                  Intel On Demandme2023/01/28 07:43 PM
                    Intel On Demand - Validation, certification?Björn Ragnar Björnsson2023/01/28 10:41 PM
                      Intel On Demand - Validation, certification?anonymou52023/01/29 02:49 AM
                        Sapphire Rapids crippleware is a naked money grabChris G2023/01/29 04:44 AM
                        Intel On Demand - Validation, certification?Groo2023/01/29 10:05 AM
                          Intel On Demand - Validation, certification?AnotherAnonymousEngineer2023/01/29 10:33 AM
                            Intel On Demand - Validation, certification?Groo2023/01/29 11:16 AM
                              Intel On Demand - Validation, certification?dmcq2023/01/29 04:32 PM
                                Intel On Demand - Validation, certification?Brendan2023/01/29 08:01 PM
                                Intel On Demand - Validation, certification?Groo2023/01/30 07:17 AM
                                  Intel On Demand - Validation, certification?Freddie2023/01/30 11:36 AM
                                  Intel On Demand - Validation, certification?anon22023/01/30 07:41 PM
                                    Intel On Demand - Validation, certification?anon22023/01/31 01:35 AM
                                      CripplewareChris G2023/01/31 05:47 AM
                                        Doctorow calls it "enshittification" (NT)hobold2023/01/31 07:55 AM
                                        Cripplewareanon22023/01/31 10:51 AM
                                          CripplewareGroo2023/02/01 02:06 PM
                                            Cripplewareanon22023/02/01 05:10 PM
                                              CripplewareChris G2023/02/01 05:52 PM
                                                Cripplewareanon22023/02/01 09:15 PM
                                                  SPR Volumeme2023/02/02 04:47 AM
                                                    SPR Volumeanon22023/02/02 07:04 AM
                                                      CripplewareChris G2023/02/02 08:12 AM
                                                        Cripplewareanon22023/02/02 08:42 AM
                                                          Cripplewareanon22023/02/02 08:48 AM
                                        CripplewareCharles2023/02/01 01:38 AM
                                          CripplewareChris G2023/02/01 02:59 AM
                                            language digressionMatt Sayler2023/02/01 04:53 PM
                                        Cripplewareme2023/02/01 06:27 PM
                                          CripplewareChris G2023/02/01 07:01 PM
                                            Cripplewareme2023/02/01 07:10 PM
                                              CripplewareChris G2023/02/01 09:32 PM
                                              CripplewareTony2023/02/01 11:18 PM
                                                Cripplewareme2023/02/02 04:27 AM
                                            Cripplewareanonymou52023/02/02 03:47 AM
                                              CripplewareChris G2023/02/02 05:59 AM
                              Intel On Demand - Enshittificationblaine2023/01/30 12:13 AM
                  Intel and mobile phonesJames2023/01/29 09:09 AM
                    Intel and mobile phonesMaxwell2023/01/29 02:25 PM
                      Intel and mobile phonesGroo2023/01/30 07:20 AM
                        Intel and mobile phonesanonymous22023/01/30 11:15 AM
                          Intel and mobile phonesDoug S2023/01/30 12:51 PM
                            Intel and mobile phonesDaniel B2023/01/31 07:37 AM
                            Intel and mobile phonesGroo2023/02/01 02:03 PM
                  SPR HBMme2023/01/29 09:17 AM
        SPR-Wme2023/02/17 05:41 PM
      Accelerators on AMD/ARMChester2023/01/29 05:41 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊