Computations on Big Integers in Apple AMX Units

By: Adrian (a.delete@this.acm.org), July 26, 2022 5:13 am
Room: Moderated Discussions
Bill G (bill.delete@this.g.com) on July 26, 2022 4:28 am wrote:
> > Aarch64 and any other ISA must also add IFMA instructions to be
> > able to have a big number performance comparable with x86-64,
> > because there is no way to have a comparable throughput when using scalar multiplications.
>
> This seems like a good thing to add to Apple’s AMX units. Each of the performance core AMX units
> already contains 64 53x53 multipliers for double precision floating point matrix multiplication.


That's right.

Adding IFMA to any CPU that already has vector or matrix FMA instructions has a negligible cost, because the computational hardware already exists, just some extra control signals are added, to disable exponent computations, and maybe an extra multiplexer is added to reroute the results to the correct position in the destination.


On many old CPUs, integer multiplication was implemented by reusing the floating-point multiplier, for cost reduction.

That always resulted in pathetic performance for integer multiplication and for big number computations (like in Intel Pentium 4 vs. AMD Opteron), because the integer multiplication implemented in this mode had a very large latency and there was no way to hide the latency when only one multiplication could be done at a time.

Now with IFMA that changes, because a very large number of multiplications are done simultaneously and also at the same time with many of the additions that are used for carry propagation. Even if big-number multiplication using IFMA remains slower than multiplying floating-point vectors, the achievable speed is many times higher than what can be achieved with any dedicated integer multiplier of acceptable cost.

So it seems that the best performance/cost ratio for future CPUs would be to have just simple 64 x 64 => 64 multipliers for the general-purpose registers, together with IFMA for big numbers, i.e. with many 52 x 52 => 112 multipliers for the wide registers, shared with the FMA units.





< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Yitian 710 anonymous22021/10/20 08:57 PM
  Yitian 710 Adrian2021/10/21 12:20 AM
  Yitian 710 Wilco2021/10/21 03:47 AM
    Yitian 710 Rayla2021/10/21 05:52 AM
      Yitian 710 Wilco2021/10/21 11:59 AM
        Yitian 710 anon22021/10/21 05:16 PM
        Yitian 710 Wilco2022/07/16 12:21 PM
          Yitian 710 Anon2022/07/16 08:22 PM
            Yitian 710 Rayla2022/07/17 09:10 AM
              Yitian 710 Anon2022/07/17 12:04 PM
                Yitian 710 Rayla2022/07/17 12:08 PM
                  Yitian 710 Wilco2022/07/17 01:16 PM
                    Yitian 710 Anon2022/07/17 01:32 PM
                      Yitian 710 Wilco2022/07/17 02:22 PM
                        Yitian 710 Anon2022/07/17 02:47 PM
                          Yitian 710 Wilco2022/07/17 03:50 PM
                            Yitian 710 Anon2022/07/17 08:46 PM
                              Yitian 710 Wilco2022/07/18 03:01 AM
                                Yitian 710 Anon2022/07/19 11:21 AM
                                  Yitian 710 Wilco2022/07/19 06:15 PM
                                    Yitian 710 Anon2022/07/21 01:25 AM
                                      Yitian 710 none2022/07/21 01:49 AM
                                        Yitian 710 Anon2022/07/21 03:03 AM
                                          Yitian 710 none2022/07/21 04:34 AM
                                      Yitian 710 James2022/07/21 02:29 AM
                                        Yitian 710 Anon2022/07/21 03:05 AM
                                      Yitian 710 Wilco2022/07/21 04:31 AM
                                        Yitian 710 Anon2022/07/21 05:17 AM
                                          Yitian 710 Wilco2022/07/21 05:33 AM
                                            Yitian 710 Anon2022/07/21 05:50 AM
                                              Yitian 710 Wilco2022/07/21 06:07 AM
                                                Yitian 710 Anon2022/07/21 06:20 AM
                                                  Yitian 710 Wilco2022/07/21 10:02 AM
                                                    Yitian 710 Anon2022/07/21 10:22 AM
                    Yitian 710 Adrian2022/07/17 11:09 PM
                      Yitian 710 Wilco2022/07/18 01:15 AM
                        Yitian 710 Adrian2022/07/18 02:35 AM
          Yitian 710 Adrian2022/07/16 11:19 PM
            Computations on Big IntegersBill G2022/07/25 10:06 PM
              Computations on Big Integersnone2022/07/25 11:35 PM
                x86 MUL 64x64 Eric Fink2022/07/26 01:06 AM
                  x86 MUL 64x64 Adrian2022/07/26 02:27 AM
                  x86 MUL 64x64 none2022/07/26 02:38 AM
                    x86 MUL 64x64 Jörn Engel2022/07/26 10:17 AM
                      x86 MUL 64x64 Linus Torvalds2022/07/27 10:13 AM
                        x86 MUL 64x64 2022/07/28 09:40 AM
                        x86 MUL 64x64 Jörn Engel2022/07/28 10:18 AM
                          More than 3 registers per instruction-.-2022/07/28 07:01 PM
                            More than 3 registers per instructionAnon2022/07/28 10:39 PM
                            More than 3 registers per instructionJörn Engel2022/07/28 10:42 PM
                              More than 3 registers per instruction-.-2022/07/29 04:31 AM
                Computations on Big IntegersBill G2022/07/26 01:40 AM
                  Computations on Big Integersnone2022/07/26 02:17 AM
                    Computations on Big IntegersBill G2022/07/26 03:52 AM
                    Computations on Big Integers---2022/07/26 09:57 AM
                  Computations on Big IntegersAdrian2022/07/26 02:53 AM
                    Computations on Big IntegersBill G2022/07/26 03:39 AM
                      Computations on Big IntegersAdrian2022/07/26 04:21 AM
                    Computations on Big Integers in Apple AMX UnitsBill G2022/07/26 04:28 AM
                      Computations on Big Integers in Apple AMX UnitsAdrian2022/07/26 05:13 AM
                        TypoAdrian2022/07/26 05:20 AM
                          IEEE binary64 is 53 bits rather than 52. (NT)Michael S2022/07/26 05:34 AM
                            IEEE binary64 is 53 bits rather than 52.Adrian2022/07/26 07:32 AM
                              IEEE binary64 is 53 bits rather than 52.Michael S2022/07/26 10:02 AM
                                IEEE binary64 is 53 bits rather than 52.Adrian2022/07/27 06:58 AM
                                  IEEE binary64 is 53 bits rather than 52.none2022/07/27 07:14 AM
                                    IEEE binary64 is 53 bits rather than 52.Adrian2022/07/27 07:55 AM
                                      Thanks a lot for the link to the article! (NT)none2022/07/27 08:09 AM
                          TypozArchJon2022/07/26 09:51 AM
                            TypoMichael S2022/07/26 10:25 AM
                              TypozArchJon2022/07/26 11:52 AM
                                TypoMichael S2022/07/26 01:02 PM
                    Computations on Big IntegersMichael S2022/07/26 05:55 AM
                      Computations on Big IntegersAdrian2022/07/26 07:59 AM
                        IFMA and DivisionBill G2022/07/26 04:25 PM
                          IFMA and Divisionrwessel2022/07/26 08:16 PM
                          IFMA and DivisionAdrian2022/07/27 07:25 AM
                      Computations on Big Integersnone2022/07/27 01:22 AM
                    Big integer multiplication with vector IFMABill G2022/07/29 01:06 AM
                      Big integer multiplication with vector IFMAAdrian2022/07/29 01:35 AM
                        Big integer multiplication with vector IFMA-.-2022/07/29 04:32 AM
                          Big integer multiplication with vector IFMAAdrian2022/07/29 09:47 PM
                            Big integer multiplication with vector IFMAAnon2022/07/30 08:12 AM
                              Big integer multiplication with vector IFMAAdrian2022/07/30 09:27 AM
                                AVX-512 unfriendly to heter-performance coresPaul A. Clayton2022/07/31 03:20 PM
                                  AVX-512 unfriendly to heter-performance coresAnon2022/07/31 03:33 PM
                                    AVX-512 unfriendly to heter-performance coresanonymou52022/07/31 05:03 PM
                                  AVX-512 unfriendly to heter-performance coresBrett2022/07/31 07:26 PM
                                  AVX-512 unfriendly to heter-performance coresAdrian2022/08/01 01:45 AM
                                    Why can't E-cores have narrow/slow AVX-512? (NT)anonymous22022/08/01 03:37 PM
                                      Why can't E-cores have narrow/slow AVX-512?Ivan2022/08/02 12:09 AM
                                        Why can't E-cores have narrow/slow AVX-512?anonymou52022/08/02 10:13 AM
                                        Why can't E-cores have narrow/slow AVX-512?Dummond D. Slow2022/08/02 03:02 PM
                                    AVX-512 unfriendly to heter-performance coresPaul A. Clayton2022/08/02 01:19 PM
                                      AVX-512 unfriendly to heter-performance coresAnon2022/08/02 09:09 PM
                                      AVX-512 unfriendly to heter-performance coresAdrian2022/08/03 12:50 AM
                                        AVX-512 unfriendly to heter-performance coresAnon2022/08/03 09:15 AM
                                          AVX-512 unfriendly to heter-performance cores-.-2022/08/03 08:17 PM
                                            AVX-512 unfriendly to heter-performance coresAnon2022/08/03 09:02 PM
                        IFMA: empty promises from Intel as usualKent R2022/07/29 07:15 PM
                          No hype lasts foreverAnon2022/07/30 08:06 AM
                        Big integer multiplication with vector IFMAme2022/07/30 09:15 AM
                Computations on Big Integers---2022/07/26 09:48 AM
                  Computations on Big Integersnone2022/07/27 01:10 AM
                    Computations on Big Integers---2022/07/28 11:43 AM
                      Computations on Big Integers---2022/07/28 06:44 PM
              Computations on Big Integersdmcq2022/07/26 02:27 PM
                Computations on Big IntegersAdrian2022/07/27 08:15 AM
                  Computations on Big IntegersBrett2022/07/27 11:07 AM
      Yitian 710 Wes Felter2021/10/21 12:51 PM
        Yitian 710 Adrian2021/10/21 01:25 PM
    Yitian 710 Anon2021/10/21 06:08 AM
      Strange definition of the word single. (NT)anon22021/10/21 05:00 PM
        AMD Epyc uses chiplets. This is why "strange"?Mark Roulo2021/10/21 05:08 PM
          AMD Epyc uses chiplets. This is why "strange"?anon22021/10/21 05:34 PM
            Yeah. Blame spec.org, too, though!Mark Roulo2021/10/21 05:58 PM
              Yeah. Blame spec.org, too, though!anon22021/10/21 08:07 PM
                Yeah. Blame spec.org, too, though!Björn Ragnar Björnsson2022/07/17 06:23 AM
              Yeah. Blame spec.org, too, though!Rayla2022/07/17 09:13 AM
                Yeah. Blame spec.org, too, though!Anon2022/07/17 12:01 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊