Expanded question about design points

By: Adrian (a.delete@this.acm.org), November 7, 2022 4:38 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on November 6, 2022 1:18 pm wrote:
> Chester (lamchester.delete@this.gmail.com) on November 5, 2022 3:24 pm wrote:
> >
> > That was not concluded. Rather it seems like a 512-bit op is fed into a single 256-bit pipe, and execution
> > starts over two cycles. The result for each half is ready
> > as fast as it would be for a plain 256-bit op, meaning
> > no latency increase.
>
> So the upper 256 bits are always staggered by one cycle? Kind of like how the original P4 double-pumped ALU
> worked and made most integer ops have an latency of just 0.5 cycles? (Except in this case it's not double-pumped,
> but you end up with an effective latency of 1 cycles even if the "whole" operation takes two).
>
> I guess for any throughput loads that's basically unnoticeable and perfectly fine (and AVX512
> is pretty much about throughput), but I'd assume you end up seeing the extra cycle of latency
> whenever you had an operation that collapsed the whole value (things like masked compares?).
>
> Or do I misunderstand?
>
> Linus


You understand correctly, but I have not seen yet any test results that prove that this is indeed the AMD implementation.


It certainly is the most probable implementation choice, together with the alternative where the second half of the operand is processed not in the next cycle in the same pipeline, but in the same cycle in the other pipeline of the same kind (the Zen 3/4 SIMD pipelines are grouped in pairs with the same properties).


The test that can expose the implementation method must be, as you say, one where the sequential execution would cause an extra cycle of latency, i.e. not based on any of the operations that process the halves independently.

Besides the Zen 3 pipelines, Zen 4 is said to have a new shuffle unit, which enables it to do shuffles where the halves of a 512-bit operand are crossed. I do not know how this shuffle unit has been added to the existing pipelines, i.e. whether it is separate and an operation could be initiated on it simultaneously with the other pipelines, or more likely, it is attached to only one of the existing pipelines, making that pipeline behave differently than the others.

So if a test would try to use shuffles for an instruction sequence trying to expose an extra clock cycle of latency, there might be additional complications, requiring a more complex testing for elucidating which is the AMD Zen 4 implementation.
















< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Jeffrey Bosboom2022/11/04 06:18 PM
  Clarification?Mark Roulo2022/11/04 08:34 PM
    Expanded question about design pointsJeffrey Bosboom2022/11/04 10:37 PM
      Expanded question about design pointsAnon2022/11/04 10:53 PM
        Expanded question about design pointsJeffrey Bosboom2022/11/04 11:05 PM
          Expanded question about design pointsAnon2022/11/04 11:30 PM
            Expanded question about design pointsChester2022/11/05 04:24 PM
              Expanded question about design pointsAnon2022/11/05 04:43 PM
              Expanded question about design pointsLinus Torvalds2022/11/06 02:18 PM
                Expanded question about design pointsAdrian2022/11/07 04:38 AM
                  Expanded question about design pointsanon2022/11/07 12:34 PM
                    Expanded question about design pointsAdrian2022/11/08 04:34 AM
                      Expanded question about design pointsChester2022/11/08 08:29 AM
                      Expanded question about design pointsanon2022/11/08 09:01 AM
                        Expanded question about design pointsAdrian2022/11/08 09:53 AM
                          Expanded question about design pointsLinus Torvalds2022/11/08 11:35 AM
                            Expanded question about design pointsBrett2022/11/08 12:33 PM
                              Expanded question about design pointsBrett2022/11/08 12:48 PM
                              Expanded question about design points---2022/11/09 11:41 AM
                            Expanded question about design pointsAdrian2022/11/08 12:45 PM
                              Expanded question about design pointsLinus Torvalds2022/11/08 01:29 PM
                                Expanded question about design pointsanon2022/11/08 01:58 PM
                              Zen 4cJames2022/11/09 03:54 AM
                                Zen 4cAndrew Clough2022/11/09 05:59 AM
                                  Zen 4canonymou52022/11/09 12:29 PM
                                    Zen 4cChester2022/11/09 09:12 PM
                            Expanded question about design pointsBjörn Ragnar Björnsson2022/11/08 09:24 PM
                              FP Adders are not so cheap compared to FP multipliersHeikki Kultala2022/11/09 09:07 AM
                                FP Adders are not so cheap compared to FP multipliersBjörn Ragnar Björnsson2022/11/10 12:10 AM
                          Expanded question about design pointsAnon2022/11/08 06:31 PM
      Expanded question about design pointsAdrian2022/11/05 03:00 AM
        Expanded question about design pointsAnon2022/11/05 03:27 AM
          Expanded question about design pointsAdrian2022/11/05 03:50 AM
            Expanded question about design pointsAnon2022/11/05 04:10 AM
              Expanded question about design pointsAdrian2022/11/05 07:34 AM
        Expanded question about design pointshobold2022/11/06 04:48 AM
          Expanded question about design pointsAdrian2022/11/07 04:19 AM
            Expanded question about design pointsAdrian2022/11/07 09:07 AM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Anon2022/11/04 08:49 PM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512noko2022/11/04 09:49 PM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Brendan2022/11/05 02:07 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊