full width general permute, single cycle throughput!

By: Adrian (a.delete@this.acm.org), September 26, 2022 11:10 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on September 26, 2022 12:00 pm wrote:
> Jörn Engel (joern.delete@this.purestorage.com) on September 26, 2022 10:56 am wrote:
> > hobold (hobold.delete@this.vectorizer.org) on September 26, 2022 10:33 am wrote:
> > >
> > > Single cycle throughput, byte granularity 512bits wide general permute?
> > > And it is the full one, able to mix bytes from two sources.
> > >
> > > That's a game changer. Dang it!
> >
> > But vpcompress is microcoded and awful. vpexpand probably
> > as well. That's a game changer I wasn't hoping for.
>
>
> It is said that vpexpand is fast, even with a memory operand.
>
> Only vpcompress is slow and only when the destination is in memory.
>


I want to add that the puzzling fact that the microcoded execution of vpcompress with a memory operand is very slow in comparison with the alternative instruction sequences that emulate it using vpcompress with a register destination, and in comparison with the other variants of vpcompress and vpexpand, makes me think that the microcode execution was not intentional.

Maybe having fast vpcompress and vpexpand with both register and memory operands was the initial intention, but then a bug has been discovered in the vpcompress with a memory operand and the instruction was patched with a microcode sequence that is suboptimal due to some unknown constraints for the patch.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Zen4's AVX512 Teardown (from mersenneforum.org)anonymous22022/09/26 07:57 AM
  Zen4's AVX512 Teardown (from mersenneforum.org)Adrian2022/09/26 08:47 AM
    full width general permute, single cycle throughput!hobold2022/09/26 09:33 AM
      full width general permute, single cycle throughput!Jörn Engel2022/09/26 09:56 AM
        full width general permute, single cycle throughput!Adrian2022/09/26 11:00 AM
          full width general permute, single cycle throughput!Adrian2022/09/26 11:10 AM
            full width general permute, single cycle throughput!hobold2022/09/26 11:43 AM
              full width general permute, single cycle throughput!Marcus2022/09/27 03:41 AM
                full width general permute, single cycle throughput!me2022/09/27 09:21 AM
                  full width general permute, single cycle throughput!Marcus2022/09/27 09:37 AM
                    full width general permute, single cycle throughput!me2022/09/27 11:33 AM
                    full width general permute, single cycle throughput!Jukka Larja2022/09/28 07:01 AM
                full width general permute, single cycle throughput!Doug S2022/09/27 10:37 AM
                  full width general permute, single cycle throughput!-.-2022/09/27 05:33 PM
                full width general permute, single cycle throughput!Mark C2022/09/27 12:28 PM
              full width general permute, single cycle throughput!Gionatan Danti2022/09/27 04:10 AM
                full width general permute, single cycle throughput!hobold2022/09/27 06:19 AM
        vpcompressJan Wassenberg2022/09/26 01:03 PM
      full width general permute, single cycle throughput!Eric Bron2022/09/28 06:40 AM
        Icelake not Skylake XEric Bron2022/09/28 06:57 AM
        full width general permute, single cycle throughput!hobold2022/09/28 07:41 AM
          full width general permute, single cycle throughput!Eric Bron2022/09/28 08:15 AM
            full width general permute, single cycle throughput!hobold2022/09/28 12:37 PM
          full width general permute, single cycle throughput!Michael S2022/09/28 09:08 AM
            full width general permute, single cycle throughput!hobold2022/09/28 12:31 PM
            full width general permute, single cycle throughput!-.-2022/09/29 05:30 AM
          full width general permute, single cycle throughput!Adrian2022/09/28 09:17 AM
      full width general permute, single cycle throughput!Eric Bron2022/09/28 06:49 AM
        full width general permute, single cycle throughput!Adrian2022/09/28 09:22 AM
          full width general permute, single cycle throughput!Eric Bron2022/09/28 09:37 AM
            full width general permute, single cycle throughput!-.-2022/09/29 05:42 AM
              full width general permute, single cycle throughput!Eric Bron2022/09/29 09:27 AM
                full width general permute, single cycle throughput!Eric Bron2022/09/29 09:39 AM
                full width general permute, single cycle throughput!-.-2022/09/29 02:49 PM
                  full width general permute, single cycle throughput!Eric Bron2022/09/30 01:18 PM
                    full width general permute, single cycle throughput!-.-2022/09/30 03:44 PM
                      full width general permute, single cycle throughput!Eric Bron2022/10/01 12:58 AM
                        full width general permute, single cycle throughput!-.-2022/10/01 03:54 AM
                          full width general permute, single cycle throughput!Eric Bron2022/10/01 04:14 AM
                            full width general permute, single cycle throughput!-.-2022/10/01 03:22 PM
                              full width general permute, single cycle throughput!Eric Bron2022/10/02 12:07 AM
                              full width general permute, single cycle throughput!Eric Bron2022/10/02 02:01 AM
                                full width general permute, single cycle throughput!-.-2022/10/02 04:21 AM
                      Ice Lake testEric Bron2022/10/03 03:11 AM
                        Ice Lake testMichael S2022/10/03 06:30 AM
                          Ice Lake testEric Bron2022/10/03 07:27 AM
                          Data Compression-.-2022/10/03 05:22 PM
                        Ice Lake test-.-2022/10/03 05:17 PM
                          Ice Lake testEric Bron2022/10/04 01:30 AM
                            Ice Lake test - post the actual source fileBjörn Ragnar Björnsson2022/10/04 03:34 PM
                              Ice Lake test - post the actual source fileEric Bron2022/10/05 12:51 AM
                              Ice Lake test - post the actual source fileEric Bron2022/10/05 01:40 AM
                              Ice Lake test high level codeEric Bron2022/10/05 03:12 AM
                                Ice Lake test high level codeBjörn Ragnar Björnsson2022/10/08 05:17 PM
                                  Ice Lake test high level codeEric Bron2022/10/09 02:50 AM
                                  Ice Lake test high level codeMichael S2022/10/09 03:00 AM
                                    Ice Lake test high level code-.-2022/10/09 05:53 AM
                                      Ice Lake test high level codeMichael S2022/10/09 07:02 AM
                                        Ice Lake test high level codeJörn Engel2022/10/09 09:24 AM
                                          Deflate's entropy encoding-.-2022/10/09 02:54 PM
                                            Deflate's entropy encodingJörn Engel2022/10/09 05:19 PM
                                              Deflate's entropy encoding-.-2022/10/10 03:40 AM
                                        Vectorising Huffman coding-.-2022/10/09 02:43 PM
                                          Vectorising Huffman codingJörn Engel2022/10/09 05:35 PM
                                            Vectorising Huffman coding-.-2022/10/10 03:48 AM
                                              Vectorising Huffman codingJörn Engel2022/10/10 08:12 AM
                                                Vectorising Huffman coding-.-2022/10/10 10:21 PM
                                                  Vectorising Huffman codingJörn Engel2022/10/10 11:11 PM
                                                    Vectorising Huffman coding-.-2022/10/11 01:20 AM
                                                      Vectorising Huffman codingJörn Engel2022/10/11 08:52 AM
                                                        Vectorising Huffman coding-.-2022/10/12 02:50 AM
                            Ice Lake testYoav2022/10/05 11:59 PM
                    Zen 4 L1D cache bandwidth with AVX-512Eric Bron2022/10/01 03:20 AM
                      Zen 4 L1D cache bandwidth with AVX-512Adrian2022/10/01 04:49 AM
                        Zen 4 L1D cache bandwidth with AVX-512-.-2022/10/01 06:37 PM
                        Zen 4 L1D cache bandwidth with AVX-512Jan Wassenberg2022/10/01 09:46 PM
          full width general permute, single cycle throughput!Adrian2022/09/28 09:41 AM
            full width general permute, single cycle throughput!Eric Bron2022/09/28 09:57 AM
              full width general permute, single cycle throughput!Adrian2022/09/28 10:31 AM
                full width general permute, single cycle throughput!Eric Bron2022/09/28 12:03 PM
                  full width general permute, single cycle throughput!Eric Bron2022/09/28 12:06 PM
            full width general permute, single cycle throughput!-.-2022/09/29 05:22 AM
        full width general permute, single cycle throughput!itsmydamnation2022/10/03 01:54 PM
          full width general permute, single cycle throughput!hobold2022/10/04 06:20 AM
  Zen4's AVX512 Teardown (from mersenneforum.org)Marcus2022/09/26 09:47 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊