Expanded question about design points

By: Jeffrey Bosboom (firstinitiallastname.delete@this.firstnamelastname.com), November 4, 2022 10:37 pm
Room: Moderated Discussions
Mark Roulo (nothanks.delete@this.xxx.com) on November 4, 2022 8:34 pm wrote:
> Is your question: Why would a CPU not allow two independent 256-bit vector instructions
> to execute simultaneously in the top and bottom halves of a 512-bit vector?

Sorry, my question is a bit confused because I am a bit confused. Let me expand a bit, and you can correct me where I'm wrong or explain how I'm not seeing this correctly.

I see (at least) four design points here:

0) One 256-bit unit. Crack 512-bit instructions into two 256-bit uops, executed sequentially. Minimizes execution unit area and register file port count and width while supporting 512-bit ISA for decreased code size or software compatibility.

1) Two 256-bit units. Crack 512-bit instructions into two 256-bit uops, scheduled however they're mixed with 256-bit uops. Increases execution unit utilization by allowing [256, 512 first half] [512 second half, 256] pairing; requires more but narrower register file ports.

2) One 512-bit unit that can execute one 512-bit instruction or two 256-bit instructions. Allows a single 256-bit instruction to block a 512-bit instruction, leaving half the unit idle (or the scheduler stalls the 256-bit instruction until it can pair, increasing latency), but not cracking 512-bit means fewer uops through the pipeline and in the uop cache. Requires the same number of register file ports as 1) when executing 256-bit ops, but also needs wide ports for 512-bit ops.

3) One 512-bit unit that can execute one instruction regardless of width. Wastes half of the unit when executing 256-bit instructions. Requires fewer but wider register file ports.

I can see why a designer would choose 0 for a low-perf, low-area design. Of the other three, 1 seems clearly better than 2 or 3. So my questions are:

- Why would a designer choose 2 over 1?
- Why would a designer choose 3 over 1 or 2?
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Jeffrey Bosboom2022/11/04 06:18 PM
  Clarification?Mark Roulo2022/11/04 08:34 PM
    Expanded question about design pointsJeffrey Bosboom2022/11/04 10:37 PM
      Expanded question about design pointsAnon2022/11/04 10:53 PM
        Expanded question about design pointsJeffrey Bosboom2022/11/04 11:05 PM
          Expanded question about design pointsAnon2022/11/04 11:30 PM
            Expanded question about design pointsChester2022/11/05 04:24 PM
              Expanded question about design pointsAnon2022/11/05 04:43 PM
              Expanded question about design pointsLinus Torvalds2022/11/06 02:18 PM
                Expanded question about design pointsAdrian2022/11/07 04:38 AM
                  Expanded question about design pointsanon2022/11/07 12:34 PM
                    Expanded question about design pointsAdrian2022/11/08 04:34 AM
                      Expanded question about design pointsChester2022/11/08 08:29 AM
                      Expanded question about design pointsanon2022/11/08 09:01 AM
                        Expanded question about design pointsAdrian2022/11/08 09:53 AM
                          Expanded question about design pointsLinus Torvalds2022/11/08 11:35 AM
                            Expanded question about design pointsBrett2022/11/08 12:33 PM
                              Expanded question about design pointsBrett2022/11/08 12:48 PM
                              Expanded question about design points---2022/11/09 11:41 AM
                            Expanded question about design pointsAdrian2022/11/08 12:45 PM
                              Expanded question about design pointsLinus Torvalds2022/11/08 01:29 PM
                                Expanded question about design pointsanon2022/11/08 01:58 PM
                              Zen 4cJames2022/11/09 03:54 AM
                                Zen 4cAndrew Clough2022/11/09 05:59 AM
                                  Zen 4canonymou52022/11/09 12:29 PM
                                    Zen 4cChester2022/11/09 09:12 PM
                            Expanded question about design pointsBjörn Ragnar Björnsson2022/11/08 09:24 PM
                              FP Adders are not so cheap compared to FP multipliersHeikki Kultala2022/11/09 09:07 AM
                                FP Adders are not so cheap compared to FP multipliersBjörn Ragnar Björnsson2022/11/10 12:10 AM
                          Expanded question about design pointsAnon2022/11/08 06:31 PM
      Expanded question about design pointsAdrian2022/11/05 03:00 AM
        Expanded question about design pointsAnon2022/11/05 03:27 AM
          Expanded question about design pointsAdrian2022/11/05 03:50 AM
            Expanded question about design pointsAnon2022/11/05 04:10 AM
              Expanded question about design pointsAdrian2022/11/05 07:34 AM
        Expanded question about design pointshobold2022/11/06 04:48 AM
          Expanded question about design pointsAdrian2022/11/07 04:19 AM
            Expanded question about design pointsAdrian2022/11/07 09:07 AM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Anon2022/11/04 08:49 PM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512noko2022/11/04 09:49 PM
  One 512-bit vector unit versus 2 256-bit vector units, re Zen 4 AVX-512Brendan2022/11/05 02:07 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊