Alternatives Implementations

By: Travis Downs (travis.downs.delete@this.gmail.com), July 13, 2020 7:41 pm
Room: Moderated Discussions
Kyle Siefring (kylesiefring.delete@this.gmail.com) on July 13, 2020 6:02 pm wrote:
> We all have got to share the expense for those masks. My guess is that AVX-512 will become cheaper
> as time goes on. That being said, maybe there should be a little less sharing. This is already
> happening with AMD being competitive. You could make the case for gpus being part of this.
>
> It would be interesting to see a cpu with low latencies for the 128-bit path with
> higher latencies for 256-bit and 512-bit. Same throughput, different latencies.
> On a smaller core, this could be like knights landing with fewer threads.
>
> For reference.
> format: reciprocal throughput/latency
> knightslanding addps .5/6 mulps .5/6
> haswell addps 1/3 mulps .5/5
> broadwell addps 1/3 mulps .5/3
> skylake addps .5/4 mulps .5/4
>
> You can see that skylake regressed latencies compared to broadwell. Intel
> clearly didn't do this for giggles. These latencies aren't free.

Well you left out FMA which went from 5 to 4, so I'd say it's a wash: now all the latencies for the core FP ops are 4, rather than a 3|5 split. If you had to pick, FMA latency is probably more important thatn add + mul latency, as most code optimized enough to care about 1 cycle is probably using FMAs.

There are a few cases where larger vectors have longer latency: load latency is 1 cycle higher for ymm and zmm vs xmm (mostly reflects the layout of the vector lanes: the first 128-bit lane is closer to the load/store ports than the second). 512-bit FP ops that go to port5 take an extra cycle (again, due to EU positioning, I believe). 128 or 256-bit FP ops never go to port5, so never get this extra latency.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Skylake-SP area breakdownDavid Kanter2020/07/12 05:13 PM
  Skylake-SP area breakdownanon22020/07/12 06:01 PM
    Skylake-SP area breakdownTravis Downs2020/07/12 07:02 PM
      Skylake-SP area breakdownanon2020/07/12 07:44 PM
  Skylake-SP area breakdownTravis Downs2020/07/12 07:03 PM
    Skylake-SP area breakdownDavid Kanter2020/07/12 07:20 PM
      To elaborateDavid Kanter2020/07/12 07:22 PM
        To elaborateTravis Downs2020/07/13 06:03 AM
          To elaborateAnon2020/07/13 06:36 AM
            To elaborateAdrian2020/07/13 12:45 PM
              To elaborateAnon2020/07/13 01:06 PM
                To elaborateChester2020/07/13 07:30 PM
  Alternatives ImplementationsKyle Siefring2020/07/13 05:02 PM
    Alternatives ImplementationsTravis Downs2020/07/13 07:41 PM
    Alternatives ImplementationsMaynard Handley2020/07/13 09:37 PM
      Alternatives ImplementationsDoug S2020/07/13 10:25 PM
        Mask costsDavid Kanter2020/07/14 07:13 AM
        Alternatives Implementationstarlinian2020/07/14 07:22 AM
          Alternatives ImplementationsDoug S2020/07/14 09:03 AM
          Alternatives ImplementationsMaynard Handley2020/07/14 09:12 AM
        Alternatives ImplementationsMaynard Handley2020/07/14 09:10 AM
          Alternatives ImplementationsDoug S2020/07/14 09:47 AM
            Alternatives ImplementationsBrett2020/07/14 12:38 PM
            Alternatives Implementationstarlinian2020/07/14 01:30 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?