By: Simon Farnsworth (simon.delete@this.farnz.org.uk), May 24, 2022 1:56 pm
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 24, 2022 11:23 am wrote:
> Mark Roulo (nothanks.delete@this.xxx.com) on May 24, 2022 7:41 am wrote:
> > The argument against heterogeneous cores isn't that SIMD isn't useful.
> I understand, but the outcome might be surprisingly similar: if the first step is, "we won't allow
> AVX-512 because it's not on the E cores", then developers might react with "AVX2 is not as good/easy
> to program so if that's all we get, perhaps we won't bother with SIMD (or this CPU) at all".
>
> > You seem to be saying something like: "It is useful enough to provide, but not so useful to provide
> > to all the cores. And having the developers deal with it is an acceptable solution."
> AVX-512 lets us produce 1 GB/s of sorted output on a single core. That
> seems useful. But is it feasible to put on all types of cores?
> I'd certainly prefer it be on all cores rather than worrying about
> heterogeneity. However, that's not under our control, is it?
>
> Perhaps the concept of heterogeneous cores (with same ISA) actually makes sense and branchy/background
> tasks can run on E cores. It's not clear to me that that's worth the trouble of hinting to
> the OS what should run where, and trading down to an 11 year old SIMD ISA.
>
> So if there's an either-or, how about
> 1) only homogeneous cores OR
> 2) proper heterogeneous without crippling the big cores: either a) all cores support
> AVX-512 or b) let software still use AVX-512, perhaps with an "I know how" opt-in.
>
> Given that we have been surprised once, it seems prudent to prepare
> for more heterogeneity rather than just hope for 1) or 2a).
>
> FYI I may be slow to reply for the rest of the week.
The thing that makes me skeptical of the need for heterogeneous ISA support is that we have past examples of CPUs handling 256 bit wide vectors using a 128 bit wide vector ALU, and just saying that instructions using the full 256 bit width take 2x the clock cycles of instructions that stick to 128 bit vectors. It doesn't seem unreasonable to do the same thing with AVX-512, and either 256 bit ALUs or 4x the clock cycles for wide instructions on E cores.
Given this, I don't see why heterogeneous ISAs are a good idea; Alder Lake has them basically because Intel didn't think through combining its off-the-shelf Gracemont core as an E core with its new AVX-512 supporting P core, and then face-planted when it tried to make this a software problem (both in Windows support and Linux support).
OSes are going to have to deal with heterogeneous performance, which does have a technical reason to exist, but that's not the same as heterogeneous ISA. And work on that is ongoing in the Linux scheduler, for example, so that it uses the most power-efficient set of cores without compromising total system performance.
> Mark Roulo (nothanks.delete@this.xxx.com) on May 24, 2022 7:41 am wrote:
> > The argument against heterogeneous cores isn't that SIMD isn't useful.
> I understand, but the outcome might be surprisingly similar: if the first step is, "we won't allow
> AVX-512 because it's not on the E cores", then developers might react with "AVX2 is not as good/easy
> to program so if that's all we get, perhaps we won't bother with SIMD (or this CPU) at all".
>
> > You seem to be saying something like: "It is useful enough to provide, but not so useful to provide
> > to all the cores. And having the developers deal with it is an acceptable solution."
> AVX-512 lets us produce 1 GB/s of sorted output on a single core. That
> seems useful. But is it feasible to put on all types of cores?
> I'd certainly prefer it be on all cores rather than worrying about
> heterogeneity. However, that's not under our control, is it?
>
> Perhaps the concept of heterogeneous cores (with same ISA) actually makes sense and branchy/background
> tasks can run on E cores. It's not clear to me that that's worth the trouble of hinting to
> the OS what should run where, and trading down to an 11 year old SIMD ISA.
>
> So if there's an either-or, how about
> 1) only homogeneous cores OR
> 2) proper heterogeneous without crippling the big cores: either a) all cores support
> AVX-512 or b) let software still use AVX-512, perhaps with an "I know how" opt-in.
>
> Given that we have been surprised once, it seems prudent to prepare
> for more heterogeneity rather than just hope for 1) or 2a).
>
> FYI I may be slow to reply for the rest of the week.
The thing that makes me skeptical of the need for heterogeneous ISA support is that we have past examples of CPUs handling 256 bit wide vectors using a 128 bit wide vector ALU, and just saying that instructions using the full 256 bit width take 2x the clock cycles of instructions that stick to 128 bit vectors. It doesn't seem unreasonable to do the same thing with AVX-512, and either 256 bit ALUs or 4x the clock cycles for wide instructions on E cores.
Given this, I don't see why heterogeneous ISAs are a good idea; Alder Lake has them basically because Intel didn't think through combining its off-the-shelf Gracemont core as an E core with its new AVX-512 supporting P core, and then face-planted when it tried to make this a software problem (both in Windows support and Linux support).
OSes are going to have to deal with heterogeneous performance, which does have a technical reason to exist, but that's not the same as heterogeneous ISA. And work on that is ongoing in the Linux scheduler, for example, so that it uses the most power-efficient set of cores without compromising total system performance.