Thanks!

By: Travis Downs (travis.downs.delete@this.gmail.com), January 23, 2020 1:51 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on January 22, 2020 9:20 pm wrote:
> Travis Downs (travis.downs.delete@this.gmail.com) on January 22, 2020 7:16 pm wrote:
> > anon (anon.delete@this.anon.com) on January 22, 2020 7:06 pm wrote:
> > > Travis Downs (travis.downs.delete@this.gmail.com) on January 22, 2020 2:28 pm wrote:
> > > > anon (anon.delete@this.anon.com) on January 17, 2020 8:12 pm wrote:
> > > > > Travis Downs (travis.downs.delete@this.gmail.com) on January 17, 2020 10:55 am wrote:
> > > > > > should say: (new LINK)
> > > > >
> > > > > Have you tested if using new registers (xmm16-31) is any different from old xmms?
> > > >
> > > > The upper 16 registers they are different in that sense that dirtying
> > > > them doesn't cause you to suffer from the implicit widening effect.
> > > >
> > > > That is, if you dirty the upper bits of zmm0 to zmm15, all future SIMD and FP instructions will
> > > > be widened to 512-bits (yes, this means that 128-bit SIMD FP instructions will cause you to use
> > > > L2, they are just as heavy as 512-bit instructions despite calculating only 128 bits of result).
> > > >
> > > > However, if you dirty the uppers of zmm16 to 31, this effect doesn't happen: there is no
> > > > implicit widening. This is probably because legacy instructions only access 0-15 and the
> > > > whole vzeroupper and associated tracking and merging scenarios applies only to 0-15.
> > > >
> > > > I believe the same is true for ymm16-31 too: if you dirty those there is no implicit
> > > > widening to 256-bits for subsequent instructions. I haven't tested it though.
> > > >
> > > > Note that this applies to the dirtying instructions, not
> > > > the subsequent "widened" instructions. If you dirty the
> > > > uppers of 0-15, then use xmm16+ or ymm16+, implicit widening stills occurs, since it is a CPU-wide state.
> > > >
> > > > Does it answer your question?
> > > >
> > > >
> > >
> > > Yes. I was just curios if there is any downside to using
> > > extra registers, but it seems that this is a strict win.
> >
> > Well one possible downside is that sometimes the EVEX-encoded instructions (needed to access
> > xmm16+) are an extra byte or so (but sometimes they are shorter) than their VEX equivalents.
> >
> > Not all AVX/AVX2 instructions are available for use the the new registers. E.g., in my test for
> > this stuff I had vpcmpeqd xmm0, xmm0, xmm0 to set the register to all ones, but this instruction
> > is not available for xmm16+ as there is no EVEX encoded version (because all the EVEX comparisons
> > are into a mask). So if you are primarily writing AVX/2 that might be annoying.
> >
> > Usual trick is to use vpternlogd.

Yup, vpternlogd is what I used.

> But I'm more interested in compiler usage of
> > new features and presumably avoiding spills a bit larger code (usual trade-offs
> > apply), so allocating extra register to intrinsic should be enabled by default
>

Yeah, it is, see this example.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
AVX-512 downclocking postTravis Downs2020/01/16 09:20 PM
  AVX-512 downclocking postanon³2020/01/17 01:25 AM
    AVX-512 downclocking postAndrei2020/01/17 02:47 AM
      AVX-512 downclocking postMontaray Jack2020/01/17 03:58 PM
        AVX-512 downclocking postAndrei2020/01/17 11:40 PM
          AVX-512 downclocking postMontaray Jack2020/01/19 02:10 AM
            AVX-512 downclocking postJan Olšan2020/01/19 01:01 PM
              AVX-512 downclocking postJan Olšan2020/01/19 01:11 PM
    AVX-512 downclocking postTravis Downs2020/01/17 02:59 PM
    AVX-512 downclocking postDavid Kanter2020/01/18 10:27 AM
      magnetic inductorsjokerman2020/01/18 08:03 PM
      AVX-512 downclocking postTravis Downs2020/01/24 11:36 AM
  AVX-512 downclocking postRay2020/01/17 02:22 AM
    AVX-512 downclocking postTravis Downs2020/01/17 01:10 PM
  AVX-512 downclocking postEtienne2020/01/17 03:16 AM
    Thanks, typos fixed and credited (NT)Travis Downs2020/01/17 01:15 PM
  Title suggestions welcome (NT)Travis Downs2020/01/17 08:54 AM
  AVX-512 downclocking postanonymou52020/01/17 10:53 AM
    AVX-512 downclocking postTravis Downs2020/01/17 11:14 AM
      AVX-512 downclocking postYoav2020/01/17 11:50 AM
        AVX-512 downclocking postTravis Downs2020/01/17 01:14 PM
      AVX-512 downclocking postanonymou52020/01/17 04:26 PM
        AVX-512 downclocking postTravis Downs2020/01/22 08:19 PM
          AVX-512 downclocking postanonymou52020/01/23 12:56 AM
            AVX-512 downclocking postFoyle2020/01/23 05:51 AM
              AVX-512 downclocking postanonymou52020/01/23 06:57 AM
                AVX-512 downclocking postTravis Downs2020/01/24 12:49 PM
            finer-grained licensesTravis Downs2020/01/24 01:03 PM
              finer-grained licensesanonymou52020/01/24 04:28 PM
                finer-grained licensesTravis Downs2020/01/25 09:46 AM
  post published (new line)Travis Downs2020/01/17 11:55 AM
    should say: (new LINK) (NT)Travis Downs2020/01/17 11:55 AM
      should say: (new LINK)Tim McCaffrey2020/01/17 01:44 PM
        Thanks, fixed and credited (NT)Travis Downs2020/01/17 02:54 PM
      should say: (new LINK)anon2020/01/17 09:12 PM
        should say: (new LINK)Travis Downs2020/01/22 03:28 PM
          Thanks!anon2020/01/22 08:06 PM
            Thanks!Travis Downs2020/01/22 08:16 PM
              Thanks!anon2020/01/22 10:20 PM
                Thanks!Travis Downs2020/01/23 01:51 AM
                  Thanks!Linus Torvalds2020/01/23 05:33 PM
                    Thanks!Travis Downs2020/01/24 12:44 PM
  Throttling dispatchGabriele Svelto2020/01/22 01:40 PM
    Itanium also used itDavid Kanter2020/01/22 02:04 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊