By: Maynard Handley (name99.delete@this.name99.org), December 20, 2020 8:46 pm
Room: Moderated Discussions
https://semiwiki.com/semiconductor-manufacturers/293615-apple-a14-die-annotation-and-analysis-terrifying-implications-for-the-industry/
I think the published story here is unlikely to be true (and could be validated by analyzing a Kirin 9000). I've already said my piece on this so I'll drop that; what's interesting is that this is the only analysis I've seen of the area of individual parts of the SoC.
How much can it be trusted? No idea.
On a separate thread of "best effort analyses with probably some, but limited, trustworthiness", various people including myself are trying to make progress on understanding AMX. Summary of the state of the art can be found reading from here downward, through all the subthreads:
https://twitter.com/dougallj/status/1339934291929694210
It would be interesting for people who know something about how the same problem (large-ish dense matrix multiply -- for AMX the basic unit is 32x32 FMAs of INT16 of FP16, 16x16 for FP32, 8x8 FP64; each performed over 4 cycles?) is solved on other platforms to state points of comparison for both performance and implementation details.
I think the published story here is unlikely to be true (and could be validated by analyzing a Kirin 9000). I've already said my piece on this so I'll drop that; what's interesting is that this is the only analysis I've seen of the area of individual parts of the SoC.
How much can it be trusted? No idea.
On a separate thread of "best effort analyses with probably some, but limited, trustworthiness", various people including myself are trying to make progress on understanding AMX. Summary of the state of the art can be found reading from here downward, through all the subthreads:
https://twitter.com/dougallj/status/1339934291929694210
It would be interesting for people who know something about how the same problem (large-ish dense matrix multiply -- for AMX the basic unit is 32x32 FMAs of INT16 of FP16, 16x16 for FP32, 8x8 FP64; each performed over 4 cycles?) is solved on other platforms to state points of comparison for both performance and implementation details.
Topic | Posted By | Date |
---|---|---|
A14 die analysis | Maynard Handley | 2020/12/20 08:46 PM |
SRAM scaling | David Kanter | 2020/12/21 09:36 AM |
SRAM scaling | anon | 2020/12/21 01:40 PM |
SRAM scaling | anon2 | 2020/12/21 04:31 PM |
SRAM scaling | RIP purple | 2020/12/21 04:05 PM |