By: Andrey (andrey.semashev.delete@this.gmail.com), September 25, 2021 4:46 am
Room: Moderated Discussions
-.- (blarg.delete@this.mailinator.com) on September 24, 2021 6:10 pm wrote:
> Andrey (andrey.semashev.delete@this.gmail.com) on September 24, 2021 8:29 am wrote:
> > Should SVE prohibit 136-bit
> > vectors? No, because your cache line and page sizes may be a multiple of 17. In such an implementation,
> > 136-bit vectors would be a reasonable choice. But in the power-of-2 world we live in, 136-bit vectors
> > are nonsensical (possibly, aside from some very special purpose hardware).
>
> Well, from what I can gather, it sounds like you're suggesting that non power-of-2 hardware may, in fact, eventuate.
Not in mass market, I don't think. Some specialized controllers, perhaps, though I have no idea what work loads would require a non-power-of-2 design, and SVE to boot.
> > By default - nothing. I mean, so long as you're targeting an unknown abstract implementation,
> > you may as well forget about alignment, instruction choices and scheduling and other micro-optimizations
> > and just write the unaligned loop that maintains correctness.
>
> Which is what I've been trying to portray the whole time really. The way
> SVE is designed encourages developers to not bother with aligning at all.
Again, you're talking about the spec, and I'm talking about the actual hardware. If all or the absolute majority of implementations don't care about alignment then that's great and developers don't need to optimize for it. But we don't have that many SVE implementations, and for other kind of instructions real hardware does care about alignment, so I'm going do assume SVE won't be an exception. Hardware designers will have to prove that alignment doesn't matter, they haven't done that yet.
> > Yes, heterogeneous cores are a pain, and it looks like x86 will follow suit in the near future.
>
> So far, it's Intel only, and I feel the core types on Alderlake aren't wildly different enough to be
> as important (to use your example of MOVDQU/PSHUFB). Of course, this could change in the future.
Rumor has it that AMD is also working on a hybrid CPU, possibly in Zen 5. I think, eventually hybrid designs will settle in x86 desktops and servers.
> On the other hand, ARM typically includes both in-order and OoO cores together.
> Whilst I've never really found a need to specifically target cores yet, my
> point was that it's generally more difficult to do in the ARM ecosystem.
>
> Considering the added difficulty of targeting specific processors on ARM, I get the feeling that support for
> such isn't a priority. I don't really think that's a bad idea - the vast majority of programs aren't going
> to optimise to such an extent - but it does feed into the idea of encouraging generic implementations.
ARM was not really present in the HPC domain, certainly not as long as x86 was. Traditionally, because of low CPU performance, heavy lifting tasks like video encoding was done with specialized hardware in the ARM world, while it was mostly in software in the x86 world. This is starting to change as ARM performance grows and it's starting to appear on desktops and servers, so there will be more incentive to optimize for it.
> > Side note about hybrid designs where different cores are radically different, like x86+ARM that AMD did.
> > That sort of combination is a somewhat different story.
> > The cores in such a hybrid are inherently incompatible
>
> I've never heard of such a configuration. The closest thing I've heard of is their K12 core being designed alongside
> Zen, which was planned to be socket compatible, but that doesn't mean you can run the two together.
> Am I missing something?
I thought I've seen somewhere news about a hybrid x86+ARM core from AMD, but I can't find that source now. Hmm, maybe I'm misremembering this, sorry.
AMD SkyBridge (https://www.extremetech.com/computing/181867-amds-project-skybridge-new-arm-and-x86-chips-that-are-pin-compatible) though was in the works. I imagine, in multi-socket systems you could use both x86 and ARM cores together. The project is dead now, so we probably will never know.
> Andrey (andrey.semashev.delete@this.gmail.com) on September 24, 2021 8:29 am wrote:
> > Should SVE prohibit 136-bit
> > vectors? No, because your cache line and page sizes may be a multiple of 17. In such an implementation,
> > 136-bit vectors would be a reasonable choice. But in the power-of-2 world we live in, 136-bit vectors
> > are nonsensical (possibly, aside from some very special purpose hardware).
>
> Well, from what I can gather, it sounds like you're suggesting that non power-of-2 hardware may, in fact, eventuate.
Not in mass market, I don't think. Some specialized controllers, perhaps, though I have no idea what work loads would require a non-power-of-2 design, and SVE to boot.
> > By default - nothing. I mean, so long as you're targeting an unknown abstract implementation,
> > you may as well forget about alignment, instruction choices and scheduling and other micro-optimizations
> > and just write the unaligned loop that maintains correctness.
>
> Which is what I've been trying to portray the whole time really. The way
> SVE is designed encourages developers to not bother with aligning at all.
Again, you're talking about the spec, and I'm talking about the actual hardware. If all or the absolute majority of implementations don't care about alignment then that's great and developers don't need to optimize for it. But we don't have that many SVE implementations, and for other kind of instructions real hardware does care about alignment, so I'm going do assume SVE won't be an exception. Hardware designers will have to prove that alignment doesn't matter, they haven't done that yet.
> > Yes, heterogeneous cores are a pain, and it looks like x86 will follow suit in the near future.
>
> So far, it's Intel only, and I feel the core types on Alderlake aren't wildly different enough to be
> as important (to use your example of MOVDQU/PSHUFB). Of course, this could change in the future.
Rumor has it that AMD is also working on a hybrid CPU, possibly in Zen 5. I think, eventually hybrid designs will settle in x86 desktops and servers.
> On the other hand, ARM typically includes both in-order and OoO cores together.
> Whilst I've never really found a need to specifically target cores yet, my
> point was that it's generally more difficult to do in the ARM ecosystem.
>
> Considering the added difficulty of targeting specific processors on ARM, I get the feeling that support for
> such isn't a priority. I don't really think that's a bad idea - the vast majority of programs aren't going
> to optimise to such an extent - but it does feed into the idea of encouraging generic implementations.
ARM was not really present in the HPC domain, certainly not as long as x86 was. Traditionally, because of low CPU performance, heavy lifting tasks like video encoding was done with specialized hardware in the ARM world, while it was mostly in software in the x86 world. This is starting to change as ARM performance grows and it's starting to appear on desktops and servers, so there will be more incentive to optimize for it.
> > Side note about hybrid designs where different cores are radically different, like x86+ARM that AMD did.
> > That sort of combination is a somewhat different story.
> > The cores in such a hybrid are inherently incompatible
>
> I've never heard of such a configuration. The closest thing I've heard of is their K12 core being designed alongside
> Zen, which was planned to be socket compatible, but that doesn't mean you can run the two together.
> Am I missing something?
I thought I've seen somewhere news about a hybrid x86+ARM core from AMD, but I can't find that source now. Hmm, maybe I'm misremembering this, sorry.
AMD SkyBridge (https://www.extremetech.com/computing/181867-amds-project-skybridge-new-arm-and-x86-chips-that-are-pin-compatible) though was in the works. I imagine, in multi-socket systems you could use both x86 and ARM cores together. The project is dead now, so we probably will never know.