By: dmcq (dmcq.delete@this.fano.co.uk), September 26, 2021 1:37 pm
Room: Moderated Discussions
Doug S (foo.delete@this.bar.bar) on September 26, 2021 9:41 am wrote:
> --- (---.delete@this.redheron.com) on September 25, 2021 9:56 am wrote:
> > Doug S (foo.delete@this.bar.bar) on September 24, 2021 11:46 pm wrote:
> > > --- (---.delete@this.redheron.com) on September 24, 2021 7:06 pm wrote:
> > > > Doug S (foo.delete@this.bar.bar) on September 24, 2021 2:12 pm wrote:
> > > > > dmcq (dmcq.delete@this.fano.co.uk) on September 24, 2021 1:05 pm wrote:
> > > > > > SVE is in multiples of 128 bits so not so bad! I' guess
> > > > > > the first hetrogenous system with a size greater than
> > > > > > 128 bits will be an Apple one and I guess they'l go for having the same size in both, perhaps they'll share
> > > > > > an SVE unit amongst the small cores like ARM. But they haven't even announced a system with SVE yet.
> > > > >
> > > > >
> > > > > Considering Apple was able to ship millions of ARMv8 CPUs less than a year after ARM released the
> > > > > spec (a FAR more difficult accomplishment than adding SVE) if Apple was going to ship CPUs with SVE
> > > > > they probably would have. SVE was announced as an optional extension to ARMv8.2 over five years ago,
> > > > > and SVE2 over two years ago - they also submitted patches for SVE2 to LLVM in late 2019.
> > > > >
> > > > > Now it is possible that the ARM Mac effort with M1 and soon Jade-C took up too much engineering
> > > > > bandwidth and they put SVE2 on the back burner, but if they were planning on introducing
> > > > > it at all doing so with the very first ARM Macs (i.e. making that something developers
> > > > > could assume exist in every ARM Mac) would be the most logical course.
> > > >
> > > > Not necessarily.
> > > > Until now Apple has debuted new cores with iPhones; so inertia made people believe
> > > > this would always be the case. But nothing says it has to be this way!
> > > >
> > > > Going forward a more logical pattern would be
> > > > - new cores introduced via a Mac high-end product, which is a more logical place to ooh and ahh over all
> > > > the new whatsits and thingamajigs that have been added to make this core X% faster than its predecessor.
> > > >
> > > > - two year cadence on cores. This allows for deeper changes, and doesn't have
> > > > to mean two year cadence on SoC's, as we saw this year with, essentially,
> > > > last year's core but improved GPU, NPU, SLC and who knows what else.
> > > >
> > > > - this scheme also allows more flexibility in timing. Everyone expects iPhones in September; it
> > > > will be tough to break that. But high end Macs arrive when they arrive. The schedule can plan for
> > > > the core to be ready in January, but if it slips two months that won't be a catastrophe.
> > > >
> > > > - Apple already have a scheme of multiple cores and SoCs
> > > > of different ages across different products. Expanding
> > > > this from the current scheme of two SoC "levels" (good, A#; and better M#) to three including a best level
> > > > (new random letter#), and having a given year's products
> > > > spread over these three SoCs and two or three cores
> > > > is no serious change (look at either iPhones, or at iPads, right now using M1, A15, and A13)
> > > >
> > > > In other words, I'm not yet convinced that the A15 represents
> > > > any sort of intrinsic slow-down in core design,
> > > > more that it's just the first SoC of Apple's Phase 3, and
> > > > like any such transition it's hard to see the pattern
> > > > with only one example. I'd say let's wait for the high end
> > > > machines before getting excited. (And high-end means
> > > > high-end. I expect the same pattern of a minor upgrade
> > > > to the M1 -- pick up the new GPUs, perhaps get either
> > > > more RAM or LPDDR5 -- but essentially the A15 core. I'm referring to the iMac Pro/Mac Pro class machines.)
> > >
> > >
> > > Personally I think A15 has a completely unchanged big core from A14. The little cores may be different,
> > > or may have gained relatively more clock rate than the big cores, since the MT scores improved a lot
> > > more than the ST scores. It is almost impossible the A15 has a new big core - all evidence is that the
> > > IPC "gain" is exactly 0%. A new design would improve IPC or at least CHANGE it in various workloads,
> > > but the odds changes of IPC across all workloads would cancel out to exactly 0% are pretty long.
> >
> > Those who look at this stuff via OS exploration claim at least
> > - AMXv3 (but no knowledge of what has changed there)
> > - more physical address bits
> > - nested virtualization
> >
> > https://twitter.com/never_released/status/1440286198178615305
> >
> > But that is not incompatible with a claim of "nothing but minimal"
> > changes or bug fixes, ie no changes relevant to performance.
> >
> > The MT performance changes (and the better battery life?) appear to be a consequence of overall energy
> > usage, which may be physical optimization. Point is, although MT performance looks higher, that's somewhat
> > misleading -- if you aggressively cool an A14 while running GB MT, you will get a number that's the
> > same sort of 10% or so lower than the A15 number, rather than the 20+% lower you get with without that
> > aggressive cooling. That MT performance was always in the A14, just hidden by thermals.
> >
> > > Why reuse the A14 core? I think it is probably like you're
> > > saying - assuming the Jade-C rumors are true they
> > > will be releasing some higher end Macs later this year or
> > > early next year. i.e. the ones using a single Jade-C
> > > - the ones using multiple Jade-Cs as chiplets will be announced at WWDC next June if I had to guess.
> >
> > I think it's not exactly that the A14 core was reused, more that the lead engineers put all their effort
> > into the next (big Mac) core, while the more junior engineers and those learning the ropes put their
> > effort into low risk items that had been sitting on the to-do list, from energy optimization and minor
> > bug fixes to boring but deemed necessary work (for what purpose?...) like nested virtualization.
> >
> > > I think Jade-C gets the new core, which may also appear
> > > in the A16 in next year's iPhone, depending on whether
> > > that uses N4 or N3. There are rumors about Apple using N4 for Macs, if Apple targeted N4 for the new core
> > > that would explain why A15 got a recycled core since it
> > > is using N5P. N4 reportedly enters volume production
> > > next month, so depending on how many working Jade-C dies they could get from risk production they might be
> > > able to ship some new Macs for Christmas but by January
> > > for sure. With N3 not entering volume production until
> > > July it may not be feasible for A16, unless they are willing
> > > to delay their normal September launch or accept
> > > the potential for greater initial shortages of iPhones than they've had the last few years.
> >
> > > However I still think if Apple was going to add SVE2 it would be stupid to have missed adding
> > > it to the M1 when we know they easily could have based on their record with ARMv8. It makes
> > > too much sense to make SVE2 a guaranteed feature of every ARM Mac so developers could assume
> > > its existence. So I don't expect to see it in Jade-C, or the A16 for that matter. Had they
> > > put a 128 bit SVE2 in M1 they might put a wider one in Jade-C for the higher end stuff.
> >
> > They have made similar decisions before.
> > It certainly wasn't ideal that the first gen (and *only* the first gen) of Intel Macs used the 32-bit only
> > Core Duo, with a rapid transition to 64-bit Core 2 Duo. And yet who remembers, or cares about, this?
> >
> > Apple know that the people who buy the first gen of these changes are either
> > - very much non-technical users who simply do not care. They buy a Mac because it's a Mac, they
> > may (or may not, depending on how much their more tech friend nag them) upgrade their OS occasionally;
> > at some point after 5..7 years Apple stops updating that machine but they don't notice, and the
> > machine keeps chugging along until it physically dies. I know plenty of these people.
> > - they are very technical users/enthusiasts/developers. And
> > they will be replacing this Mac within two or three years.
> >
> > So either way, nothing really matters much that the first generation is not everything one
> > might want. Sure, it means some heterogeneity in the landscape; but that's always there.
> > Hell, this whole transition will be a lot less messy than the years surrounding the PPC
> > to Intel transition where Apple was juggling transition to 64-bit, multi-core, and Intel;
> > and there were multiple machines released with different subsets of these features!
>
>
> I'm not saying they should have / would have done SVE2 in the M1 because of the customers. It
> would have been for the developers. That way SVE2 would be a baseline for all ARM Macs and they
> wouldn't need to worry about checking for it, providing alternative code paths, etc.
>
> I agree that releasing 32 bit x86 Macs was not ideal, but how long after Apple released the first one
> did it take them to make 64 bits available across the whole range of SKUs Apple needed - including
> low power stuff for laptops? They would have had to delay the transition off PPC by another couple
> years, when the performance (and increasingly power as well) gap was hurting them in the market.
>
> In the case of x86 Apple didn't control its own fate, it had to rely on Intel's schedule. With M1 they
> controlled whether SVE2 was implemented in M1 or not. I believe it is very unlikely they would have made
> the choice not to implement it in M1 if it was planned any ARM Mac SoCs in the next few years.
>
> The problem is the whole SVE thing is mostly pointless for Apple, even on the Mac. The performance
> isn't going to be all that different from NEON unless you implement vectors wider than 128 bits,
> which really only makes sense on something like the Mac Pro. The only place it makes a difference
> is functionality that exists on SVE but not NEON (I assume they are no longer adding new functions
> to NEON, but don't know if that's actually the case or not) but that's mostly going to be niche
> type stuff especially when you have the Mac's GPU and NPU available.
They seem to be continuing to add stuff to Neon for straightforward things they do to SVE2 like Bfloat.
> I believe if ARM wanted SVE to succeed they should have deprecated NEON when they introduced it, then
> made SVE2 mandatory in ARMv9 and NEON optional. The guaranteed existence of NEON makes that the obvious
> target for developers, leaving SVE as the red headed stepchild that's only used in the custom HPC world
> where the wider vectors are a real win. ARM hasn't included SVE in its own cores, and given that it is
> still optional in ARMv9 I would be mildly surprised if they include it in their upcoming v9 cores.
SVE2 is going to succeed anyway and there was no need for Arm to be a PITA deprecating it. Removing Neon would make negligable difference to the hardware and there's straightforward programs where Neon works fine without sticking in code for predicates. There's a bit of extra control needed for SVE2 compared to Neon but there's no need for any extra computation capability that I know of. Things like scatter gather and predication and handling page faults though would all require extra work in the CPU and perhaps even some better cache handling to fully get the benefits.
Arm is including SVE2 in all its own new A series and server cores. Anadtech says it is baseline standard in ARMv9. I believe it is only optional for non user visible cores.
> --- (---.delete@this.redheron.com) on September 25, 2021 9:56 am wrote:
> > Doug S (foo.delete@this.bar.bar) on September 24, 2021 11:46 pm wrote:
> > > --- (---.delete@this.redheron.com) on September 24, 2021 7:06 pm wrote:
> > > > Doug S (foo.delete@this.bar.bar) on September 24, 2021 2:12 pm wrote:
> > > > > dmcq (dmcq.delete@this.fano.co.uk) on September 24, 2021 1:05 pm wrote:
> > > > > > SVE is in multiples of 128 bits so not so bad! I' guess
> > > > > > the first hetrogenous system with a size greater than
> > > > > > 128 bits will be an Apple one and I guess they'l go for having the same size in both, perhaps they'll share
> > > > > > an SVE unit amongst the small cores like ARM. But they haven't even announced a system with SVE yet.
> > > > >
> > > > >
> > > > > Considering Apple was able to ship millions of ARMv8 CPUs less than a year after ARM released the
> > > > > spec (a FAR more difficult accomplishment than adding SVE) if Apple was going to ship CPUs with SVE
> > > > > they probably would have. SVE was announced as an optional extension to ARMv8.2 over five years ago,
> > > > > and SVE2 over two years ago - they also submitted patches for SVE2 to LLVM in late 2019.
> > > > >
> > > > > Now it is possible that the ARM Mac effort with M1 and soon Jade-C took up too much engineering
> > > > > bandwidth and they put SVE2 on the back burner, but if they were planning on introducing
> > > > > it at all doing so with the very first ARM Macs (i.e. making that something developers
> > > > > could assume exist in every ARM Mac) would be the most logical course.
> > > >
> > > > Not necessarily.
> > > > Until now Apple has debuted new cores with iPhones; so inertia made people believe
> > > > this would always be the case. But nothing says it has to be this way!
> > > >
> > > > Going forward a more logical pattern would be
> > > > - new cores introduced via a Mac high-end product, which is a more logical place to ooh and ahh over all
> > > > the new whatsits and thingamajigs that have been added to make this core X% faster than its predecessor.
> > > >
> > > > - two year cadence on cores. This allows for deeper changes, and doesn't have
> > > > to mean two year cadence on SoC's, as we saw this year with, essentially,
> > > > last year's core but improved GPU, NPU, SLC and who knows what else.
> > > >
> > > > - this scheme also allows more flexibility in timing. Everyone expects iPhones in September; it
> > > > will be tough to break that. But high end Macs arrive when they arrive. The schedule can plan for
> > > > the core to be ready in January, but if it slips two months that won't be a catastrophe.
> > > >
> > > > - Apple already have a scheme of multiple cores and SoCs
> > > > of different ages across different products. Expanding
> > > > this from the current scheme of two SoC "levels" (good, A#; and better M#) to three including a best level
> > > > (new random letter#), and having a given year's products
> > > > spread over these three SoCs and two or three cores
> > > > is no serious change (look at either iPhones, or at iPads, right now using M1, A15, and A13)
> > > >
> > > > In other words, I'm not yet convinced that the A15 represents
> > > > any sort of intrinsic slow-down in core design,
> > > > more that it's just the first SoC of Apple's Phase 3, and
> > > > like any such transition it's hard to see the pattern
> > > > with only one example. I'd say let's wait for the high end
> > > > machines before getting excited. (And high-end means
> > > > high-end. I expect the same pattern of a minor upgrade
> > > > to the M1 -- pick up the new GPUs, perhaps get either
> > > > more RAM or LPDDR5 -- but essentially the A15 core. I'm referring to the iMac Pro/Mac Pro class machines.)
> > >
> > >
> > > Personally I think A15 has a completely unchanged big core from A14. The little cores may be different,
> > > or may have gained relatively more clock rate than the big cores, since the MT scores improved a lot
> > > more than the ST scores. It is almost impossible the A15 has a new big core - all evidence is that the
> > > IPC "gain" is exactly 0%. A new design would improve IPC or at least CHANGE it in various workloads,
> > > but the odds changes of IPC across all workloads would cancel out to exactly 0% are pretty long.
> >
> > Those who look at this stuff via OS exploration claim at least
> > - AMXv3 (but no knowledge of what has changed there)
> > - more physical address bits
> > - nested virtualization
> >
> > https://twitter.com/never_released/status/1440286198178615305
> >
> > But that is not incompatible with a claim of "nothing but minimal"
> > changes or bug fixes, ie no changes relevant to performance.
> >
> > The MT performance changes (and the better battery life?) appear to be a consequence of overall energy
> > usage, which may be physical optimization. Point is, although MT performance looks higher, that's somewhat
> > misleading -- if you aggressively cool an A14 while running GB MT, you will get a number that's the
> > same sort of 10% or so lower than the A15 number, rather than the 20+% lower you get with without that
> > aggressive cooling. That MT performance was always in the A14, just hidden by thermals.
> >
> > > Why reuse the A14 core? I think it is probably like you're
> > > saying - assuming the Jade-C rumors are true they
> > > will be releasing some higher end Macs later this year or
> > > early next year. i.e. the ones using a single Jade-C
> > > - the ones using multiple Jade-Cs as chiplets will be announced at WWDC next June if I had to guess.
> >
> > I think it's not exactly that the A14 core was reused, more that the lead engineers put all their effort
> > into the next (big Mac) core, while the more junior engineers and those learning the ropes put their
> > effort into low risk items that had been sitting on the to-do list, from energy optimization and minor
> > bug fixes to boring but deemed necessary work (for what purpose?...) like nested virtualization.
> >
> > > I think Jade-C gets the new core, which may also appear
> > > in the A16 in next year's iPhone, depending on whether
> > > that uses N4 or N3. There are rumors about Apple using N4 for Macs, if Apple targeted N4 for the new core
> > > that would explain why A15 got a recycled core since it
> > > is using N5P. N4 reportedly enters volume production
> > > next month, so depending on how many working Jade-C dies they could get from risk production they might be
> > > able to ship some new Macs for Christmas but by January
> > > for sure. With N3 not entering volume production until
> > > July it may not be feasible for A16, unless they are willing
> > > to delay their normal September launch or accept
> > > the potential for greater initial shortages of iPhones than they've had the last few years.
> >
> > > However I still think if Apple was going to add SVE2 it would be stupid to have missed adding
> > > it to the M1 when we know they easily could have based on their record with ARMv8. It makes
> > > too much sense to make SVE2 a guaranteed feature of every ARM Mac so developers could assume
> > > its existence. So I don't expect to see it in Jade-C, or the A16 for that matter. Had they
> > > put a 128 bit SVE2 in M1 they might put a wider one in Jade-C for the higher end stuff.
> >
> > They have made similar decisions before.
> > It certainly wasn't ideal that the first gen (and *only* the first gen) of Intel Macs used the 32-bit only
> > Core Duo, with a rapid transition to 64-bit Core 2 Duo. And yet who remembers, or cares about, this?
> >
> > Apple know that the people who buy the first gen of these changes are either
> > - very much non-technical users who simply do not care. They buy a Mac because it's a Mac, they
> > may (or may not, depending on how much their more tech friend nag them) upgrade their OS occasionally;
> > at some point after 5..7 years Apple stops updating that machine but they don't notice, and the
> > machine keeps chugging along until it physically dies. I know plenty of these people.
> > - they are very technical users/enthusiasts/developers. And
> > they will be replacing this Mac within two or three years.
> >
> > So either way, nothing really matters much that the first generation is not everything one
> > might want. Sure, it means some heterogeneity in the landscape; but that's always there.
> > Hell, this whole transition will be a lot less messy than the years surrounding the PPC
> > to Intel transition where Apple was juggling transition to 64-bit, multi-core, and Intel;
> > and there were multiple machines released with different subsets of these features!
>
>
> I'm not saying they should have / would have done SVE2 in the M1 because of the customers. It
> would have been for the developers. That way SVE2 would be a baseline for all ARM Macs and they
> wouldn't need to worry about checking for it, providing alternative code paths, etc.
>
> I agree that releasing 32 bit x86 Macs was not ideal, but how long after Apple released the first one
> did it take them to make 64 bits available across the whole range of SKUs Apple needed - including
> low power stuff for laptops? They would have had to delay the transition off PPC by another couple
> years, when the performance (and increasingly power as well) gap was hurting them in the market.
>
> In the case of x86 Apple didn't control its own fate, it had to rely on Intel's schedule. With M1 they
> controlled whether SVE2 was implemented in M1 or not. I believe it is very unlikely they would have made
> the choice not to implement it in M1 if it was planned any ARM Mac SoCs in the next few years.
>
> The problem is the whole SVE thing is mostly pointless for Apple, even on the Mac. The performance
> isn't going to be all that different from NEON unless you implement vectors wider than 128 bits,
> which really only makes sense on something like the Mac Pro. The only place it makes a difference
> is functionality that exists on SVE but not NEON (I assume they are no longer adding new functions
> to NEON, but don't know if that's actually the case or not) but that's mostly going to be niche
> type stuff especially when you have the Mac's GPU and NPU available.
They seem to be continuing to add stuff to Neon for straightforward things they do to SVE2 like Bfloat.
> I believe if ARM wanted SVE to succeed they should have deprecated NEON when they introduced it, then
> made SVE2 mandatory in ARMv9 and NEON optional. The guaranteed existence of NEON makes that the obvious
> target for developers, leaving SVE as the red headed stepchild that's only used in the custom HPC world
> where the wider vectors are a real win. ARM hasn't included SVE in its own cores, and given that it is
> still optional in ARMv9 I would be mildly surprised if they include it in their upcoming v9 cores.
SVE2 is going to succeed anyway and there was no need for Arm to be a PITA deprecating it. Removing Neon would make negligable difference to the hardware and there's straightforward programs where Neon works fine without sticking in code for predicates. There's a bit of extra control needed for SVE2 compared to Neon but there's no need for any extra computation capability that I know of. Things like scatter gather and predication and handling page faults though would all require extra work in the CPU and perhaps even some better cache handling to fully get the benefits.
Arm is including SVE2 in all its own new A series and server cores. Anadtech says it is baseline standard in ARMv9. I believe it is only optional for non user visible cores.