Thanks!

By: anon (anon.delete@this.anon.com), January 22, 2020 10:20 pm
Room: Moderated Discussions
Travis Downs (travis.downs.delete@this.gmail.com) on January 22, 2020 7:16 pm wrote:
> anon (anon.delete@this.anon.com) on January 22, 2020 7:06 pm wrote:
> > Travis Downs (travis.downs.delete@this.gmail.com) on January 22, 2020 2:28 pm wrote:
> > > anon (anon.delete@this.anon.com) on January 17, 2020 8:12 pm wrote:
> > > > Travis Downs (travis.downs.delete@this.gmail.com) on January 17, 2020 10:55 am wrote:
> > > > > should say: (new LINK)
> > > >
> > > > Have you tested if using new registers (xmm16-31) is any different from old xmms?
> > >
> > > The upper 16 registers they are different in that sense that dirtying
> > > them doesn't cause you to suffer from the implicit widening effect.
> > >
> > > That is, if you dirty the upper bits of zmm0 to zmm15, all future SIMD and FP instructions will
> > > be widened to 512-bits (yes, this means that 128-bit SIMD FP instructions will cause you to use
> > > L2, they are just as heavy as 512-bit instructions despite calculating only 128 bits of result).
> > >
> > > However, if you dirty the uppers of zmm16 to 31, this effect doesn't happen: there is no
> > > implicit widening. This is probably because legacy instructions only access 0-15 and the
> > > whole vzeroupper and associated tracking and merging scenarios applies only to 0-15.
> > >
> > > I believe the same is true for ymm16-31 too: if you dirty those there is no implicit
> > > widening to 256-bits for subsequent instructions. I haven't tested it though.
> > >
> > > Note that this applies to the dirtying instructions, not
> > > the subsequent "widened" instructions. If you dirty the
> > > uppers of 0-15, then use xmm16+ or ymm16+, implicit widening stills occurs, since it is a CPU-wide state.
> > >
> > > Does it answer your question?
> > >
> > >
> >
> > Yes. I was just curios if there is any downside to using
> > extra registers, but it seems that this is a strict win.
>
> Well one possible downside is that sometimes the EVEX-encoded instructions (needed to access
> xmm16+) are an extra byte or so (but sometimes they are shorter) than their VEX equivalents.
>
> Not all AVX/AVX2 instructions are available for use the the new registers. E.g., in my test for
> this stuff I had vpcmpeqd xmm0, xmm0, xmm0 to set the register to all ones, but this instruction
> is not available for xmm16+ as there is no EVEX encoded version (because all the EVEX comparisons
> are into a mask). So if you are primarily writing AVX/2 that might be annoying.
>
Usual trick is to use vpternlogd. But I'm more interested in compiler usage of new features and presumably avoiding spills > a bit larger code (usual trade-offs apply), so allocating extra register to intrinsic should be enabled by default

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
AVX-512 downclocking postTravis Downs2020/01/16 09:20 PM
  AVX-512 downclocking postanon³2020/01/17 01:25 AM
    AVX-512 downclocking postAndrei2020/01/17 02:47 AM
      AVX-512 downclocking postMontaray Jack2020/01/17 03:58 PM
        AVX-512 downclocking postAndrei2020/01/17 11:40 PM
          AVX-512 downclocking postMontaray Jack2020/01/19 02:10 AM
            AVX-512 downclocking postJan Olšan2020/01/19 01:01 PM
              AVX-512 downclocking postJan Olšan2020/01/19 01:11 PM
    AVX-512 downclocking postTravis Downs2020/01/17 02:59 PM
    AVX-512 downclocking postDavid Kanter2020/01/18 10:27 AM
      magnetic inductorsjokerman2020/01/18 08:03 PM
      AVX-512 downclocking postTravis Downs2020/01/24 11:36 AM
  AVX-512 downclocking postRay2020/01/17 02:22 AM
    AVX-512 downclocking postTravis Downs2020/01/17 01:10 PM
  AVX-512 downclocking postEtienne2020/01/17 03:16 AM
    Thanks, typos fixed and credited (NT)Travis Downs2020/01/17 01:15 PM
  Title suggestions welcome (NT)Travis Downs2020/01/17 08:54 AM
  AVX-512 downclocking postanonymou52020/01/17 10:53 AM
    AVX-512 downclocking postTravis Downs2020/01/17 11:14 AM
      AVX-512 downclocking postYoav2020/01/17 11:50 AM
        AVX-512 downclocking postTravis Downs2020/01/17 01:14 PM
      AVX-512 downclocking postanonymou52020/01/17 04:26 PM
        AVX-512 downclocking postTravis Downs2020/01/22 08:19 PM
          AVX-512 downclocking postanonymou52020/01/23 12:56 AM
            AVX-512 downclocking postFoyle2020/01/23 05:51 AM
              AVX-512 downclocking postanonymou52020/01/23 06:57 AM
                AVX-512 downclocking postTravis Downs2020/01/24 12:49 PM
            finer-grained licensesTravis Downs2020/01/24 01:03 PM
              finer-grained licensesanonymou52020/01/24 04:28 PM
                finer-grained licensesTravis Downs2020/01/25 09:46 AM
  post published (new line)Travis Downs2020/01/17 11:55 AM
    should say: (new LINK) (NT)Travis Downs2020/01/17 11:55 AM
      should say: (new LINK)Tim McCaffrey2020/01/17 01:44 PM
        Thanks, fixed and credited (NT)Travis Downs2020/01/17 02:54 PM
      should say: (new LINK)anon2020/01/17 09:12 PM
        should say: (new LINK)Travis Downs2020/01/22 03:28 PM
          Thanks!anon2020/01/22 08:06 PM
            Thanks!Travis Downs2020/01/22 08:16 PM
              Thanks!anon2020/01/22 10:20 PM
                Thanks!Travis Downs2020/01/23 01:51 AM
                  Thanks!Linus Torvalds2020/01/23 05:33 PM
                    Thanks!Travis Downs2020/01/24 12:44 PM
  Throttling dispatchGabriele Svelto2020/01/22 01:40 PM
    Itanium also used itDavid Kanter2020/01/22 02:04 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊