IEEE binary64 is 53 bits rather than 52.

By: Adrian (a.delete@this.acm.org), July 27, 2022 7:55 am
Room: Moderated Discussions
none (none.delete@this.none.com) on July 27, 2022 7:14 am wrote:
> Adrian (a.bocaniciu.delete@this.computer.org) on July 27, 2022 6:58 am wrote:
> [...]
> > Actually, even without IFMA, it is possible to implement the big-number computations
> > using the floating-point operations and 52-bit per 64-bit "limbs".
> >
> > However, a large number of extra instructions are needed to pre-process and post-process the
> > "limbs", so much of the advantage of doing many multiplications simultaneously is lost.
>
> This has been done years ago:
>
> http://www.acsel-lab.com/arithmetic/arith23/data/1616a032.pdf
> https://github.com/vkrasnov/vpmadd
>
> Too bad no real measurement has been done on real hardware.
>
> Note that specializing GMP routines for specific size can bring some interesting speedup, so
> direct comparisons as done by the above article should be considered with care.
>
> There's another article which I can't access:
> https://link.springer.com/chapter/10.1007/978-3-030-38991-8_5
>
> According to the intro they claim it's up to 3 times faster than GMP.
>


That inaccessible article can be found at https://sci-hub.se/10.1007/978-3-030-38991-8

There is a 50 MB PDF file, because is the first volume of the conference proceedings, not only that article.

TLDR:

Unlike in the older article, where the program was implemented on a simulator, here they have used real hardware, i.e. the famous Cannon Lake Intel NUC.

While that was the first device ever with IFMA, the performance on server CPUs with double FMA units should be double compared to what they have obtained on the NUC.

For their implementation, the crossover with GMP was at 1024-bit numbers, so it was useful only for numbers longer than that.



The overhead for short numbers can vary in very wide ranges depending on the quality of the implementation, so I have no idea if indeed IFMA needs numbers so large to be useful in general or that threshold was valid only for this specific implementation.

In any case even with a better implementation, IFMA is unlikely to be efficient for numbers shorter than 512 bits as for short numbers it is increasingly more difficult to keep all multipliers busy, though it might still be possible to use IFMA to increase the throughput of many concurrent multiplications of shorter numbers.





< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Yitian 710 anonymous22021/10/20 08:57 PM
  Yitian 710 Adrian2021/10/21 12:20 AM
  Yitian 710 Wilco2021/10/21 03:47 AM
    Yitian 710 Rayla2021/10/21 05:52 AM
      Yitian 710 Wilco2021/10/21 11:59 AM
        Yitian 710 anon22021/10/21 05:16 PM
        Yitian 710 Wilco2022/07/16 12:21 PM
          Yitian 710 Anon2022/07/16 08:22 PM
            Yitian 710 Rayla2022/07/17 09:10 AM
              Yitian 710 Anon2022/07/17 12:04 PM
                Yitian 710 Rayla2022/07/17 12:08 PM
                  Yitian 710 Wilco2022/07/17 01:16 PM
                    Yitian 710 Anon2022/07/17 01:32 PM
                      Yitian 710 Wilco2022/07/17 02:22 PM
                        Yitian 710 Anon2022/07/17 02:47 PM
                          Yitian 710 Wilco2022/07/17 03:50 PM
                            Yitian 710 Anon2022/07/17 08:46 PM
                              Yitian 710 Wilco2022/07/18 03:01 AM
                                Yitian 710 Anon2022/07/19 11:21 AM
                                  Yitian 710 Wilco2022/07/19 06:15 PM
                                    Yitian 710 Anon2022/07/21 01:25 AM
                                      Yitian 710 none2022/07/21 01:49 AM
                                        Yitian 710 Anon2022/07/21 03:03 AM
                                          Yitian 710 none2022/07/21 04:34 AM
                                      Yitian 710 James2022/07/21 02:29 AM
                                        Yitian 710 Anon2022/07/21 03:05 AM
                                      Yitian 710 Wilco2022/07/21 04:31 AM
                                        Yitian 710 Anon2022/07/21 05:17 AM
                                          Yitian 710 Wilco2022/07/21 05:33 AM
                                            Yitian 710 Anon2022/07/21 05:50 AM
                                              Yitian 710 Wilco2022/07/21 06:07 AM
                                                Yitian 710 Anon2022/07/21 06:20 AM
                                                  Yitian 710 Wilco2022/07/21 10:02 AM
                                                    Yitian 710 Anon2022/07/21 10:22 AM
                    Yitian 710 Adrian2022/07/17 11:09 PM
                      Yitian 710 Wilco2022/07/18 01:15 AM
                        Yitian 710 Adrian2022/07/18 02:35 AM
          Yitian 710 Adrian2022/07/16 11:19 PM
            Computations on Big IntegersBill G2022/07/25 10:06 PM
              Computations on Big Integersnone2022/07/25 11:35 PM
                x86 MUL 64x64 Eric Fink2022/07/26 01:06 AM
                  x86 MUL 64x64 Adrian2022/07/26 02:27 AM
                  x86 MUL 64x64 none2022/07/26 02:38 AM
                    x86 MUL 64x64 Jörn Engel2022/07/26 10:17 AM
                      x86 MUL 64x64 Linus Torvalds2022/07/27 10:13 AM
                        x86 MUL 64x64 2022/07/28 09:40 AM
                        x86 MUL 64x64 Jörn Engel2022/07/28 10:18 AM
                          More than 3 registers per instruction-.-2022/07/28 07:01 PM
                            More than 3 registers per instructionAnon2022/07/28 10:39 PM
                            More than 3 registers per instructionJörn Engel2022/07/28 10:42 PM
                              More than 3 registers per instruction-.-2022/07/29 04:31 AM
                Computations on Big IntegersBill G2022/07/26 01:40 AM
                  Computations on Big Integersnone2022/07/26 02:17 AM
                    Computations on Big IntegersBill G2022/07/26 03:52 AM
                    Computations on Big Integers---2022/07/26 09:57 AM
                  Computations on Big IntegersAdrian2022/07/26 02:53 AM
                    Computations on Big IntegersBill G2022/07/26 03:39 AM
                      Computations on Big IntegersAdrian2022/07/26 04:21 AM
                    Computations on Big Integers in Apple AMX UnitsBill G2022/07/26 04:28 AM
                      Computations on Big Integers in Apple AMX UnitsAdrian2022/07/26 05:13 AM
                        TypoAdrian2022/07/26 05:20 AM
                          IEEE binary64 is 53 bits rather than 52. (NT)Michael S2022/07/26 05:34 AM
                            IEEE binary64 is 53 bits rather than 52.Adrian2022/07/26 07:32 AM
                              IEEE binary64 is 53 bits rather than 52.Michael S2022/07/26 10:02 AM
                                IEEE binary64 is 53 bits rather than 52.Adrian2022/07/27 06:58 AM
                                  IEEE binary64 is 53 bits rather than 52.none2022/07/27 07:14 AM
                                    IEEE binary64 is 53 bits rather than 52.Adrian2022/07/27 07:55 AM
                                      Thanks a lot for the link to the article! (NT)none2022/07/27 08:09 AM
                          TypozArchJon2022/07/26 09:51 AM
                            TypoMichael S2022/07/26 10:25 AM
                              TypozArchJon2022/07/26 11:52 AM
                                TypoMichael S2022/07/26 01:02 PM
                    Computations on Big IntegersMichael S2022/07/26 05:55 AM
                      Computations on Big IntegersAdrian2022/07/26 07:59 AM
                        IFMA and DivisionBill G2022/07/26 04:25 PM
                          IFMA and Divisionrwessel2022/07/26 08:16 PM
                          IFMA and DivisionAdrian2022/07/27 07:25 AM
                      Computations on Big Integersnone2022/07/27 01:22 AM
                    Big integer multiplication with vector IFMABill G2022/07/29 01:06 AM
                      Big integer multiplication with vector IFMAAdrian2022/07/29 01:35 AM
                        Big integer multiplication with vector IFMA-.-2022/07/29 04:32 AM
                          Big integer multiplication with vector IFMAAdrian2022/07/29 09:47 PM
                            Big integer multiplication with vector IFMAAnon2022/07/30 08:12 AM
                              Big integer multiplication with vector IFMAAdrian2022/07/30 09:27 AM
                                AVX-512 unfriendly to heter-performance coresPaul A. Clayton2022/07/31 03:20 PM
                                  AVX-512 unfriendly to heter-performance coresAnon2022/07/31 03:33 PM
                                    AVX-512 unfriendly to heter-performance coresanonymou52022/07/31 05:03 PM
                                  AVX-512 unfriendly to heter-performance coresBrett2022/07/31 07:26 PM
                                  AVX-512 unfriendly to heter-performance coresAdrian2022/08/01 01:45 AM
                                    Why can't E-cores have narrow/slow AVX-512? (NT)anonymous22022/08/01 03:37 PM
                                      Why can't E-cores have narrow/slow AVX-512?Ivan2022/08/02 12:09 AM
                                        Why can't E-cores have narrow/slow AVX-512?anonymou52022/08/02 10:13 AM
                                        Why can't E-cores have narrow/slow AVX-512?Dummond D. Slow2022/08/02 03:02 PM
                                    AVX-512 unfriendly to heter-performance coresPaul A. Clayton2022/08/02 01:19 PM
                                      AVX-512 unfriendly to heter-performance coresAnon2022/08/02 09:09 PM
                                      AVX-512 unfriendly to heter-performance coresAdrian2022/08/03 12:50 AM
                                        AVX-512 unfriendly to heter-performance coresAnon2022/08/03 09:15 AM
                                          AVX-512 unfriendly to heter-performance cores-.-2022/08/03 08:17 PM
                                            AVX-512 unfriendly to heter-performance coresAnon2022/08/03 09:02 PM
                        IFMA: empty promises from Intel as usualKent R2022/07/29 07:15 PM
                          No hype lasts foreverAnon2022/07/30 08:06 AM
                        Big integer multiplication with vector IFMAme2022/07/30 09:15 AM
                Computations on Big Integers---2022/07/26 09:48 AM
                  Computations on Big Integersnone2022/07/27 01:10 AM
                    Computations on Big Integers---2022/07/28 11:43 AM
                      Computations on Big Integers---2022/07/28 06:44 PM
              Computations on Big Integersdmcq2022/07/26 02:27 PM
                Computations on Big IntegersAdrian2022/07/27 08:15 AM
                  Computations on Big IntegersBrett2022/07/27 11:07 AM
      Yitian 710 Wes Felter2021/10/21 12:51 PM
        Yitian 710 Adrian2021/10/21 01:25 PM
    Yitian 710 Anon2021/10/21 06:08 AM
      Strange definition of the word single. (NT)anon22021/10/21 05:00 PM
        AMD Epyc uses chiplets. This is why "strange"?Mark Roulo2021/10/21 05:08 PM
          AMD Epyc uses chiplets. This is why "strange"?anon22021/10/21 05:34 PM
            Yeah. Blame spec.org, too, though!Mark Roulo2021/10/21 05:58 PM
              Yeah. Blame spec.org, too, though!anon22021/10/21 08:07 PM
                Yeah. Blame spec.org, too, though!Björn Ragnar Björnsson2022/07/17 06:23 AM
              Yeah. Blame spec.org, too, though!Rayla2022/07/17 09:13 AM
                Yeah. Blame spec.org, too, though!Anon2022/07/17 12:01 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊