By: Mark Heath (none.delete@this.none.none), February 1, 2023 4:45 pm
Room: Moderated Discussions
Freddie (freddie.delete@this.witherden.org) on February 1, 2023 8:54 am wrote:
> Also, I'll note that with static scheduling it is not necessarily true that P cores will operate at the performance level of E cores.
> So long as OMP_PROC_BIND (and related variables) are not set, the OS scheduler is free to move threads around. Hence,
> when the P cores finish (and go to sleep) the scheduler can shift a task over from the E cores to them.
>
> Contrived example. We have 32 work items on a system with 4P and 4E cores with no SMT. Lets assume a P core can do
> 2 items a second and an E core can do 1 item a second. With a static schedule each core gets 32 / 8 = 4 items.
>
> After t = 2 seconds the P cores are done, and the E cores have 2 items remaining. Scheduler sees this, and shifts the E core
> threads to the P cores. At t = 3, we're done (maybe sooner as the E cores being idle may allow the P cores to boost higher).
> In contrast, a system with 8E cores would take until t = 4 to finish.
Thank you for your interesting educational example. You’re right about static scheduling not being as bad as I said because the OS scheduler can move threads from E cores to P cores. Suppose there were 48 work items in your example. With a static schedule, each core would be assigned 48/8 = 6 work items. The P cores would finish at t=3 and each E core would have 3 work items left to do at that point. The OS scheduler would move the E core threads to the P cores and each P core would complete those 3 items at t=4.5.
Now suppose the programmer used OpenMP’s auto schedule policy and the OpenMP runtime was smart enough to notice, after the E cores complete one work item, that the E cores are taking twice as long as the P cores per work item. Since the auto schedule policy allows the OpenMP runtime to figure out the best schedule, the runtime could, in theory, assign 8 work items to each P core and 4 work items to each E core. In this case, the loop would complete at t=4 instead of t=4.5. Does it seem practical for an OpenMP runtime to do this when the auto schedule policy is used? Is there any way for a programmer to manually give the P cores twice as many iterations as the E cores so the loop completes at t=4?
Regarding Heikki Kultala’s comment: Apple’s hardware does not have SMT so splitting a physical P core into two virtual P cores is not possible as a way of making the performance of the all threads in the system more uniform.
> Also, I'll note that with static scheduling it is not necessarily true that P cores will operate at the performance level of E cores.
> So long as OMP_PROC_BIND (and related variables) are not set, the OS scheduler is free to move threads around. Hence,
> when the P cores finish (and go to sleep) the scheduler can shift a task over from the E cores to them.
>
> Contrived example. We have 32 work items on a system with 4P and 4E cores with no SMT. Lets assume a P core can do
> 2 items a second and an E core can do 1 item a second. With a static schedule each core gets 32 / 8 = 4 items.
>
> After t = 2 seconds the P cores are done, and the E cores have 2 items remaining. Scheduler sees this, and shifts the E core
> threads to the P cores. At t = 3, we're done (maybe sooner as the E cores being idle may allow the P cores to boost higher).
> In contrast, a system with 8E cores would take until t = 4 to finish.
Thank you for your interesting educational example. You’re right about static scheduling not being as bad as I said because the OS scheduler can move threads from E cores to P cores. Suppose there were 48 work items in your example. With a static schedule, each core would be assigned 48/8 = 6 work items. The P cores would finish at t=3 and each E core would have 3 work items left to do at that point. The OS scheduler would move the E core threads to the P cores and each P core would complete those 3 items at t=4.5.
Now suppose the programmer used OpenMP’s auto schedule policy and the OpenMP runtime was smart enough to notice, after the E cores complete one work item, that the E cores are taking twice as long as the P cores per work item. Since the auto schedule policy allows the OpenMP runtime to figure out the best schedule, the runtime could, in theory, assign 8 work items to each P core and 4 work items to each E core. In this case, the loop would complete at t=4 instead of t=4.5. Does it seem practical for an OpenMP runtime to do this when the auto schedule policy is used? Is there any way for a programmer to manually give the P cores twice as many iterations as the E cores so the loop completes at t=4?
Regarding Heikki Kultala’s comment: Apple’s hardware does not have SMT so splitting a physical P core into two virtual P cores is not possible as a way of making the performance of the all threads in the system more uniform.
Topic | Posted By | Date |
---|---|---|
NYT on SPR | --- | 2023/01/26 10:37 AM |
NYT on SPR | Chris G | 2023/01/26 06:02 PM |
NYT on SPR | me | 2023/01/26 07:44 PM |
NYT on SPR | Anne O. Nymous | 2023/01/27 01:09 AM |
NYT on SPR | Michael S | 2023/01/27 03:22 AM |
NYT on SPR | --- | 2023/01/27 10:31 AM |
Pat has been trimming the Intel product portfolio | Mark Roulo | 2023/01/27 01:29 PM |
NYT on SPR | James | 2023/01/27 02:00 PM |
NYT on SPR | Adrian | 2023/01/28 03:55 AM |
NYT on SPR | anonymou5 | 2023/01/28 04:03 AM |
NYT on SPR | Adrian | 2023/01/28 04:14 AM |
NYT on SPR | Groo | 2023/01/29 09:50 AM |
NYT on SPR | Groo | 2023/01/29 09:46 AM |
NYT on SPR | Brendan | 2023/01/29 01:00 PM |
NYT on SPR | Anon4 | 2023/01/29 04:06 PM |
NYT on SPR | Brendan | 2023/01/29 07:03 PM |
NYT on SPR | Groo | 2023/01/30 07:09 AM |
NYT on SPR | Groo | 2023/01/29 09:39 AM |
NYT on SPR | AnonSoft | 2023/01/30 11:01 AM |
NYT on SPR | hobold | 2023/01/30 12:39 PM |
NYT on SPR | AnonSoft | 2023/01/30 05:34 PM |
NYT on SPR | hobold | 2023/01/31 04:40 AM |
NYT on SPR | Jukka Larja | 2023/01/31 07:13 AM |
Heterogeneous CPU Cores With OpenMP | Mark Heath | 2023/02/01 04:45 AM |
Heterogeneous CPU Cores With OpenMP | Freddie | 2023/02/01 05:05 AM |
Heterogeneous CPU Cores With OpenMP | Mark Heath | 2023/02/01 06:42 AM |
Heterogeneous CPU Cores With OpenMP | Freddie | 2023/02/01 09:54 AM |
Heterogeneous CPU Cores With OpenMP | Mark Heath | 2023/02/01 04:45 PM |
Heterogeneous CPU Cores With OpenMP | —- | 2023/02/02 04:35 PM |
Heterogeneous CPU Cores With OpenMP | Freddie | 2023/02/02 04:39 PM |
Heterogeneous CPU Cores With OpenMP | --- | 2023/02/03 12:15 PM |
Heterogeneous CPU Cores With OpenMP | Freddie | 2023/02/03 03:46 PM |
Heterogeneous CPU Cores With OpenMP | Anne O. Nymous | 2023/02/03 12:57 AM |
Heterogeneous CPU Cores With OpenMP | --- | 2023/02/03 12:35 PM |
Heterogeneous CPU Cores With OpenMP | Anne O. Nymous | 2023/02/03 01:35 PM |
different big/little split.. | Heikki Kultala | 2023/02/03 02:33 PM |
Heterogeneous CPU Cores With OpenMP | Paul H | 2023/02/03 06:51 PM |
Heterogeneous CPU Cores With OpenMP | Jukka Larja | 2023/02/01 06:24 AM |
When heavily loaded, Threads run about equally fast on E-cores than P-cores | Heikki Kultala | 2023/02/01 02:08 PM |
NYT on SPR | Chester | 2023/01/27 09:30 AM |
use archive.org | anon | 2023/01/27 06:08 PM |
Bypassing paywalls | Doug S | 2023/01/28 02:05 PM |
NYT on SPR | Chris G | 2023/01/27 06:54 PM |
Intel On Demand | Chris G | 2023/01/28 04:24 AM |
Intel On Demand | me | 2023/01/28 06:24 AM |
Intel On Demand | Groo | 2023/01/29 09:53 AM |
Intel On Demand | rwessel | 2023/01/28 09:41 AM |
Intel On Demand | --- | 2023/01/28 11:37 AM |
Anit-waste bias | Paul A. Clayton | 2023/01/28 07:57 PM |
Intel On Demand | Groo | 2023/01/29 09:58 AM |
Intel On Demand | Andrey | 2023/01/30 05:04 PM |
Intel On Demand | blaine | 2023/01/28 03:07 PM |
Intel On Demand | me | 2023/01/28 03:25 PM |
Intel On Demand | me | 2023/01/28 03:33 PM |
Intel On Demand | Chris G | 2023/01/28 07:06 PM |
Intel On Demand | me | 2023/01/28 07:43 PM |
Intel On Demand - Validation, certification? | Björn Ragnar Björnsson | 2023/01/28 10:41 PM |
Intel On Demand - Validation, certification? | anonymou5 | 2023/01/29 02:49 AM |
Sapphire Rapids crippleware is a naked money grab | Chris G | 2023/01/29 04:44 AM |
Intel On Demand - Validation, certification? | Groo | 2023/01/29 10:05 AM |
Intel On Demand - Validation, certification? | AnotherAnonymousEngineer | 2023/01/29 10:33 AM |
Intel On Demand - Validation, certification? | Groo | 2023/01/29 11:16 AM |
Intel On Demand - Validation, certification? | dmcq | 2023/01/29 04:32 PM |
Intel On Demand - Validation, certification? | Brendan | 2023/01/29 08:01 PM |
Intel On Demand - Validation, certification? | Groo | 2023/01/30 07:17 AM |
Intel On Demand - Validation, certification? | Freddie | 2023/01/30 11:36 AM |
Intel On Demand - Validation, certification? | anon2 | 2023/01/30 07:41 PM |
Intel On Demand - Validation, certification? | anon2 | 2023/01/31 01:35 AM |
Crippleware | Chris G | 2023/01/31 05:47 AM |
Doctorow calls it "enshittification" (NT) | hobold | 2023/01/31 07:55 AM |
Crippleware | anon2 | 2023/01/31 10:51 AM |
Crippleware | Groo | 2023/02/01 02:06 PM |
Crippleware | anon2 | 2023/02/01 05:10 PM |
Crippleware | Chris G | 2023/02/01 05:52 PM |
Crippleware | anon2 | 2023/02/01 09:15 PM |
SPR Volume | me | 2023/02/02 04:47 AM |
SPR Volume | anon2 | 2023/02/02 07:04 AM |
Crippleware | Chris G | 2023/02/02 08:12 AM |
Crippleware | anon2 | 2023/02/02 08:42 AM |
Crippleware | anon2 | 2023/02/02 08:48 AM |
Crippleware | Charles | 2023/02/01 01:38 AM |
Crippleware | Chris G | 2023/02/01 02:59 AM |
language digression | Matt Sayler | 2023/02/01 04:53 PM |
Crippleware | me | 2023/02/01 06:27 PM |
Crippleware | Chris G | 2023/02/01 07:01 PM |
Crippleware | me | 2023/02/01 07:10 PM |
Crippleware | Chris G | 2023/02/01 09:32 PM |
Crippleware | Tony | 2023/02/01 11:18 PM |
Crippleware | me | 2023/02/02 04:27 AM |
Crippleware | anonymou5 | 2023/02/02 03:47 AM |
Crippleware | Chris G | 2023/02/02 05:59 AM |
Intel On Demand - Enshittification | blaine | 2023/01/30 12:13 AM |
Intel and mobile phones | James | 2023/01/29 09:09 AM |
Intel and mobile phones | Maxwell | 2023/01/29 02:25 PM |
Intel and mobile phones | Groo | 2023/01/30 07:20 AM |
Intel and mobile phones | anonymous2 | 2023/01/30 11:15 AM |
Intel and mobile phones | Doug S | 2023/01/30 12:51 PM |
Intel and mobile phones | Daniel B | 2023/01/31 07:37 AM |
Intel and mobile phones | Groo | 2023/02/01 02:03 PM |
SPR HBM | me | 2023/01/29 09:17 AM |
SPR-W | me | 2023/02/17 05:41 PM |
Accelerators on AMD/ARM | Chester | 2023/01/29 05:41 PM |