By: Simon Farnsworth (simon.delete@this.farnz.org.uk), May 14, 2022 6:11 am
Room: Moderated Discussions
Matt Lomann (mlohmann.delete@this.noemail.com) on May 13, 2022 7:02 pm wrote:
> Thank you. That makes sense. The complexity of these modern processors is shocking. It’s a
> miracle they ever work at all. Hiding the AMX instructions behind an API allows Apple to fix
> some hardware bugs with software, such as by not using a particular sequence of instructions.
>
It also allows Apple to change the instruction mix over time - the operations are expected to be high latency anyway (function call, not instruction), and thus it's OK if Apple shifts functionality to and from software over time.
It also lets Apple do the stunt that Transmeta is alleged to have done with CMS in the TM5800 series chips - you can have instruction sequences that are not allowed because they're too thermally expensive (thus induce surprise throttling), and Apple can simply make sure that the approved API doesn't run those sequences.
More, those sequences can change from generation to generation, and even based on the cooling system in use, and Apple can simply account for this in software.
> The AMX engine must run at the same clock frequency as the 4 P cores or 2 E cores it is connected to. If one
> P core starts doing a lot of AMX operations, I wonder if the clock frequency of the remaining 3 P cores gets
> reduced, sort of like the clock frequency gets reduced when using AVX512 instructions on a Xeon processor.
> Thank you. That makes sense. The complexity of these modern processors is shocking. It’s a
> miracle they ever work at all. Hiding the AMX instructions behind an API allows Apple to fix
> some hardware bugs with software, such as by not using a particular sequence of instructions.
>
It also allows Apple to change the instruction mix over time - the operations are expected to be high latency anyway (function call, not instruction), and thus it's OK if Apple shifts functionality to and from software over time.
It also lets Apple do the stunt that Transmeta is alleged to have done with CMS in the TM5800 series chips - you can have instruction sequences that are not allowed because they're too thermally expensive (thus induce surprise throttling), and Apple can simply make sure that the approved API doesn't run those sequences.
More, those sequences can change from generation to generation, and even based on the cooling system in use, and Apple can simply account for this in software.
> The AMX engine must run at the same clock frequency as the 4 P cores or 2 E cores it is connected to. If one
> P core starts doing a lot of AMX operations, I wonder if the clock frequency of the remaining 3 P cores gets
> reduced, sort of like the clock frequency gets reduced when using AVX512 instructions on a Xeon processor.