Expanded question about design points

By: Anon (no.delete@this.spam.com), November 8, 2022 6:31 pm
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on November 8, 2022 8:53 am wrote:
> No, it does not show this with certainty. More tests are necessary.
>
> The sequence FMA512, FMA256, FMA512, FMA256 ... could be reordered as FMA512, FMA512,
> FMA256, FMA256 ... and executed in 3 clock cycles by processing the halves of a 512-bit
> operand in the same cycle and two 2565-bit operations also in a single cycle.

Make the operations dependent, each stream of dependent single cycle AVX256 instructions would consume one port for full throughput, if AVX512 were executed by coupling units then adding a fez AVX512 instructions would increase the latency of the AVX256 stream, if AVX512 is executed serially the latency of the dependent AVX256 instructions wouldn't change. ON Zen 3 VPAVGB has a throughput of 2 per clock and latency of 1, would be perfect for this test if Zen 4 keeps the same throughput.

> I am skeptical that AMD has chosen the variant with sequential processing of the halves, because
> that creates problems for the few instructions that need to access both halves. I would not have
> chosen this variant, because I do not believe that it has any advantage in cost or performance over
> the multiple alternatives that can process simultaneously the two halves of a 512-bit operand.

Which problems? Keep in mind that the units may keep intermediate results.

I think the simultaneously execution is extremely unlikely, AMD does not uses a unified scheduller (unlike Intel), executing an instruction in both units would require touching both FPU schedullers, how would you implement this? I think the serially execution is more likely because the implementation would be much simpler, and AMD already implemented things like that before, serially executing a vector is trivial compared to what they have already done.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Jeffrey Bosboom2022/11/04 06:18 PM
  Clarification?Mark Roulo2022/11/04 08:34 PM
    Expanded question about design pointsJeffrey Bosboom2022/11/04 10:37 PM
      Expanded question about design pointsAnon2022/11/04 10:53 PM
        Expanded question about design pointsJeffrey Bosboom2022/11/04 11:05 PM
          Expanded question about design pointsAnon2022/11/04 11:30 PM
            Expanded question about design pointsChester2022/11/05 04:24 PM
              Expanded question about design pointsAnon2022/11/05 04:43 PM
              Expanded question about design pointsLinus Torvalds2022/11/06 02:18 PM
                Expanded question about design pointsAdrian2022/11/07 04:38 AM
                  Expanded question about design pointsanon2022/11/07 12:34 PM
                    Expanded question about design pointsAdrian2022/11/08 04:34 AM
                      Expanded question about design pointsChester2022/11/08 08:29 AM
                      Expanded question about design pointsanon2022/11/08 09:01 AM
                        Expanded question about design pointsAdrian2022/11/08 09:53 AM
                          Expanded question about design pointsLinus Torvalds2022/11/08 11:35 AM
                            Expanded question about design pointsBrett2022/11/08 12:33 PM
                              Expanded question about design pointsBrett2022/11/08 12:48 PM
                              Expanded question about design points---2022/11/09 11:41 AM
                            Expanded question about design pointsAdrian2022/11/08 12:45 PM
                              Expanded question about design pointsLinus Torvalds2022/11/08 01:29 PM
                                Expanded question about design pointsanon2022/11/08 01:58 PM
                              Zen 4cJames2022/11/09 03:54 AM
                                Zen 4cAndrew Clough2022/11/09 05:59 AM
                                  Zen 4canonymou52022/11/09 12:29 PM
                                    Zen 4cChester2022/11/09 09:12 PM
                            Expanded question about design pointsBjörn Ragnar Björnsson2022/11/08 09:24 PM
                              FP Adders are not so cheap compared to FP multipliersHeikki Kultala2022/11/09 09:07 AM
                                FP Adders are not so cheap compared to FP multipliersBjörn Ragnar Björnsson2022/11/10 12:10 AM
                          Expanded question about design pointsAnon2022/11/08 06:31 PM
      Expanded question about design pointsAdrian2022/11/05 03:00 AM
        Expanded question about design pointsAnon2022/11/05 03:27 AM
          Expanded question about design pointsAdrian2022/11/05 03:50 AM
            Expanded question about design pointsAnon2022/11/05 04:10 AM
              Expanded question about design pointsAdrian2022/11/05 07:34 AM
        Expanded question about design pointshobold2022/11/06 04:48 AM
          Expanded question about design pointsAdrian2022/11/07 04:19 AM
            Expanded question about design pointsAdrian2022/11/07 09:07 AM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Anon2022/11/04 08:49 PM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512noko2022/11/04 09:49 PM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Brendan2022/11/05 02:07 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊