By: Brendan (btrotter.delete@this.gmail.com), June 2, 2022 9:13 pm
Room: Moderated Discussions
Hi,
--- (---.delete@this.redheron.com) on June 2, 2022 1:06 pm wrote:
> Eric Fink (eric.delete.delete@this.this.anon.com) on June 2, 2022 5:43 am wrote:
> > Anon (no.delete@this.spam.com) on June 2, 2022 12:35 am wrote:
> >
> > > Apple is using TSMC 5nm while Intel is using 10nm which they call 7nm
> > > and AMD uses TSMC's 7nm, and both Intel Intel and AMD supports SMT.
> > >
> > > So, yes, Apple achieve almost the same performance at much lower power, but because they have power
> > > advantage and Intel and AMD are willing to use A LOT of extra power to get 20% single thread.
> > >
> > > Zen 4 will be more apples-to-Apple comparison, at least on throughput, where perf per watt is what matters.
> >
> > That is the reply often given but I don't find it convincing. A14/M1 is not the only product at
> > 5nm and yet it's peak performance and perf/watt so far are unmatched. Notebookcheck recently did
> > a series of benchmarks comparing the efficiency and performance of latest CPUs at locked TDP,
> > and a 5nm Firestorm at 4W outperformed a Zen 3+ at 9.5W — that's more than 2x difference in
> > efficiency, and this is in a benchmark that maximally favours x86 as it runs a suboptimal code
> > path on M1. I have hard time believing that TSMC's 5nm has some kind of magical properties that
> > allows a vendor to reduce the power consumption by 2x at the same performance level.
> >
> > Your other argument — that x86 vendors trade some of the
> > inherent efficiency to get this extra 20% performance
> > has more merit IMO, but still doesn't provide a satisfactory
> > explanation. First of all, AMD Zen3 isn't any faster
> > than Firestorm, except maybe in a handful of AVX2 SIMD
> > throughput tests where it's higher clock lets it pull
> > ahead. Intel is a bit different, since they generally seem
> > fine with making power-hungry cores if they can get
> > ahead in performance (in the test linked above Golden Cove
> > is 20% faster than Firestorm — with a whopping 6x
> > higher power consumption!). But to offset this, Intel is now
> > adding throughtput/efficiency cores, which do exactly
> > what you are talking about — trade peak performance for
> > much lower power usage. And yet, when you compare Alder
> > Lake E-cores to the P-cores, the former are around 40% slower
> > at roundly 2.5x lower power consumption (SPEC2017,
> > Anandtech). In contrast, Apple's Firestorm is around 10-15%
> > slower than Golden Cove at average 5x lower power
> > consumption. I mean, there is a gap between Intel's 10nm
> > and TSMC's 5nm, but it just isn't that much of a gap.
> > If it were just "Apple trades peak performance for better
> > efficiency", I'd expect them to be 20-30% slower with
> > 2-3x lower power consumption, but they somehow do significantly better than that.
>
> The problem is ill-posed because there is no abstract x86 ideal to be compared with an abstract ARMv9 ideal;
> there are only implementations. Implementations created by companies with very different incentives. Of
> course one dimension of incentives is prioritizing power over GHz, but even more important is the same
> fight that lies behind every RWT argument -- how much do you privilege the past over the future?
>
> The argument that's actually playing out is not x86 vs ARM, it is "reboil
> the ocean every two decades" vs "perpetual compatibility".
> At the end of the day, once Apple passes Intel, you'll see this in full force. It will no longer matter that
> Apple is 1% faster than Intel's best or 30% faster because "Intel runs the apps I want, and Apple doesn't". Which
> may even be true -- but shows how silly is the argument and the "evidence" provided for it right now.
Yes; but it's more than that.
An 80x86 PC is a "mix and match, choose your own pieces" thing, where you can choose from multiple different operating systems, many different CPUs, many different motherboards, many different RAM providers, ...), where interoperability between competitors' pieces is achieved through industry standards (UEFI, ACPI, PCI, NVMe, xHCI, ...). The competition between suppliers leads to huge benefits for consumers - lots of flexibility, lower prices and continual improvement of all the pieces, the ability to upgrade pieces later (especially if parts fail), etc; even when you're relying on a large system builder (e.g. HP, Dell, ...) to choose the pieces for you.
Apple's systems are the opposite - a sealed black box of proprietary stuff with no standards and no flexibility. They don't even bother to document their proprietary CPU extensions so that software developers (e.g. Linux developers) can use them without reverse engineering; and if you want something even slightly different (e.g. multiple RAID controllers, or redundant/hot-plug 48V DC power supplies, or...) then Apple will tell you to go screw yourself.
For generic laptop/desktop Apple's way is tolerable (crappy, but marketing can turn "overpriced and inflexible" into "prestigious"). For servers, not so much.
There are other companies (e.g. Marvell - their ThunderX2) which are a lot more tolerable in theory (following Arm Server Base System Architecture, which is almost literally recycling all of the existing 80x86 PC standards like UEFI, ACPI, etc); but the performance gap (especially for single-thread performance) is too much; and it's reasonable to assume they'll never have the market share to generate the $$ needed to develop high-end "M1 like" CPUs that are needed to establish market share (beyond a "flock of chickens" niche).
With that in mind; the biggest long-term threat to 80x86 on servers is possibly AMD. They have the ability to create a competitive AArch64 chip that could (e.g.) plug into the same AM5 socket as Zen 4; and they've attempted similar (Opteron A1100-series) in the past (when they had less $$ to throw at it). It's just that AMD don't have the incentive to switch from 80x86 while they're doing so well against Intel. Ironically; that could imply that if Intel fixes their manufacturing process problems they increase the risk of fighting a meaningful "80x86 vs. ARM" battle for servers.
> Intel could doubtless do somewhat better if they gave up some compatibility, and
> a lot better if they abandoned all compatibility. But they won't do that. So...
> And this means the whole package. It's not just ISA, it's memory model, it's IO model, it's cache protocol
> and locking primitives, it's socketed DRAMs, etc etc. Of COURSE all that stuff costs; if it didn't Apple
> (mostly free to use whatever they want) would be copying it instead doing something different.
>
> Ultimately the question being asked is "could Intel produce faster chips if they
> changed everything while also keeping everything the same?" Well, uh, ???
>
> > I will be also very curious to see how Zen4 performs in comparison. Given the less than
> > perfect information we have available, it is really difficult to ascertain how much
> > influence can be attributed to the process, to the ISA, to the design philosophy or
> > maybe just the elusive "magic sauce" that individual vendors bring to the table.
- Brendan
--- (---.delete@this.redheron.com) on June 2, 2022 1:06 pm wrote:
> Eric Fink (eric.delete.delete@this.this.anon.com) on June 2, 2022 5:43 am wrote:
> > Anon (no.delete@this.spam.com) on June 2, 2022 12:35 am wrote:
> >
> > > Apple is using TSMC 5nm while Intel is using 10nm which they call 7nm
> > > and AMD uses TSMC's 7nm, and both Intel Intel and AMD supports SMT.
> > >
> > > So, yes, Apple achieve almost the same performance at much lower power, but because they have power
> > > advantage and Intel and AMD are willing to use A LOT of extra power to get 20% single thread.
> > >
> > > Zen 4 will be more apples-to-Apple comparison, at least on throughput, where perf per watt is what matters.
> >
> > That is the reply often given but I don't find it convincing. A14/M1 is not the only product at
> > 5nm and yet it's peak performance and perf/watt so far are unmatched. Notebookcheck recently did
> > a series of benchmarks comparing the efficiency and performance of latest CPUs at locked TDP,
> > and a 5nm Firestorm at 4W outperformed a Zen 3+ at 9.5W — that's more than 2x difference in
> > efficiency, and this is in a benchmark that maximally favours x86 as it runs a suboptimal code
> > path on M1. I have hard time believing that TSMC's 5nm has some kind of magical properties that
> > allows a vendor to reduce the power consumption by 2x at the same performance level.
> >
> > Your other argument — that x86 vendors trade some of the
> > inherent efficiency to get this extra 20% performance
> > has more merit IMO, but still doesn't provide a satisfactory
> > explanation. First of all, AMD Zen3 isn't any faster
> > than Firestorm, except maybe in a handful of AVX2 SIMD
> > throughput tests where it's higher clock lets it pull
> > ahead. Intel is a bit different, since they generally seem
> > fine with making power-hungry cores if they can get
> > ahead in performance (in the test linked above Golden Cove
> > is 20% faster than Firestorm — with a whopping 6x
> > higher power consumption!). But to offset this, Intel is now
> > adding throughtput/efficiency cores, which do exactly
> > what you are talking about — trade peak performance for
> > much lower power usage. And yet, when you compare Alder
> > Lake E-cores to the P-cores, the former are around 40% slower
> > at roundly 2.5x lower power consumption (SPEC2017,
> > Anandtech). In contrast, Apple's Firestorm is around 10-15%
> > slower than Golden Cove at average 5x lower power
> > consumption. I mean, there is a gap between Intel's 10nm
> > and TSMC's 5nm, but it just isn't that much of a gap.
> > If it were just "Apple trades peak performance for better
> > efficiency", I'd expect them to be 20-30% slower with
> > 2-3x lower power consumption, but they somehow do significantly better than that.
>
> The problem is ill-posed because there is no abstract x86 ideal to be compared with an abstract ARMv9 ideal;
> there are only implementations. Implementations created by companies with very different incentives. Of
> course one dimension of incentives is prioritizing power over GHz, but even more important is the same
> fight that lies behind every RWT argument -- how much do you privilege the past over the future?
>
> The argument that's actually playing out is not x86 vs ARM, it is "reboil
> the ocean every two decades" vs "perpetual compatibility".
> At the end of the day, once Apple passes Intel, you'll see this in full force. It will no longer matter that
> Apple is 1% faster than Intel's best or 30% faster because "Intel runs the apps I want, and Apple doesn't". Which
> may even be true -- but shows how silly is the argument and the "evidence" provided for it right now.
Yes; but it's more than that.
An 80x86 PC is a "mix and match, choose your own pieces" thing, where you can choose from multiple different operating systems, many different CPUs, many different motherboards, many different RAM providers, ...), where interoperability between competitors' pieces is achieved through industry standards (UEFI, ACPI, PCI, NVMe, xHCI, ...). The competition between suppliers leads to huge benefits for consumers - lots of flexibility, lower prices and continual improvement of all the pieces, the ability to upgrade pieces later (especially if parts fail), etc; even when you're relying on a large system builder (e.g. HP, Dell, ...) to choose the pieces for you.
Apple's systems are the opposite - a sealed black box of proprietary stuff with no standards and no flexibility. They don't even bother to document their proprietary CPU extensions so that software developers (e.g. Linux developers) can use them without reverse engineering; and if you want something even slightly different (e.g. multiple RAID controllers, or redundant/hot-plug 48V DC power supplies, or...) then Apple will tell you to go screw yourself.
For generic laptop/desktop Apple's way is tolerable (crappy, but marketing can turn "overpriced and inflexible" into "prestigious"). For servers, not so much.
There are other companies (e.g. Marvell - their ThunderX2) which are a lot more tolerable in theory (following Arm Server Base System Architecture, which is almost literally recycling all of the existing 80x86 PC standards like UEFI, ACPI, etc); but the performance gap (especially for single-thread performance) is too much; and it's reasonable to assume they'll never have the market share to generate the $$ needed to develop high-end "M1 like" CPUs that are needed to establish market share (beyond a "flock of chickens" niche).
With that in mind; the biggest long-term threat to 80x86 on servers is possibly AMD. They have the ability to create a competitive AArch64 chip that could (e.g.) plug into the same AM5 socket as Zen 4; and they've attempted similar (Opteron A1100-series) in the past (when they had less $$ to throw at it). It's just that AMD don't have the incentive to switch from 80x86 while they're doing so well against Intel. Ironically; that could imply that if Intel fixes their manufacturing process problems they increase the risk of fighting a meaningful "80x86 vs. ARM" battle for servers.
> Intel could doubtless do somewhat better if they gave up some compatibility, and
> a lot better if they abandoned all compatibility. But they won't do that. So...
> And this means the whole package. It's not just ISA, it's memory model, it's IO model, it's cache protocol
> and locking primitives, it's socketed DRAMs, etc etc. Of COURSE all that stuff costs; if it didn't Apple
> (mostly free to use whatever they want) would be copying it instead doing something different.
>
> Ultimately the question being asked is "could Intel produce faster chips if they
> changed everything while also keeping everything the same?" Well, uh, ???
>
> > I will be also very curious to see how Zen4 performs in comparison. Given the less than
> > perfect information we have available, it is really difficult to ascertain how much
> > influence can be attributed to the process, to the ISA, to the design philosophy or
> > maybe just the elusive "magic sauce" that individual vendors bring to the table.
- Brendan