gather latency

By: Travis Downs (travis.downs.delete@this.gmail.com), June 19, 2019 1:15 pm
Room: Moderated Discussions
Why is Intel gather latency so damn high on x86?

On Intel chips the latency depends only on the number of loaded elements, not their width or the address width[1] and on SKL 2, 4 or 8 elements the latency is 18, 20 and 22 cycles respectively.

On SKX, the latency dropped by a cycle and you also have 16 element gathers, and the latency is 17, 19, 21 and 25 respectively.


All of these take 1 load uop per element, plus ~3 other ops total for p015, probably to merge the results.

It looks like the latency differences between the various element counts is mostly explained by the additional load uops: starting 8 extra loads takes 4 cycles, so 21 + 4 = 25. Same for 4 vs 8 elements (2 vs 4 is off by a cycle though, not all that weird).

So there is some kind of baseline latency of 17-18 cycles for any gather, in addition to the time to issue all the load uops. What could cause that?

On Zen, the latency is similar (actually a couple cycles better in many cases), but the throughput is terrible: similar to the latency. Gathers use up to 65 uops there, so it's a crummy implementation but at least everything makes sense: latency is that long because the total amount of work to do is huge and you are limited by execution throughput to chew through dozens of uops.

---

[1] That is, the QQ DQ and QD forms for a given vector size load the same number of elements and have the same latency. Similarly the QQ form for ymm registers loads 4 elements just like the DD form for xmm regs, and they have the same latency.
 Next Post in Thread >
TopicPosted ByDate
gather latencyTravis Downs2019/06/19 01:15 PM
  gather throughputMichael S2019/06/19 01:55 PM
    gather throughputEric Bron2019/06/19 02:59 PM
      gather throughputMichael S2019/06/20 12:57 AM
        gather throughputEric Bron2019/06/20 02:11 AM
    gather throughputTravis Downs2019/06/19 05:47 PM
      gather throughputMichael S2019/06/20 02:37 AM
  gather latencyLinus Torvalds2019/06/19 06:02 PM
    gather latencyTravis Downs2019/06/19 07:31 PM
      gather latencyanon2019/06/20 03:48 AM
        gather latencyTravis Downs2019/06/20 09:07 AM
          gather latencyanon2019/06/20 10:34 AM
            gather latencyTravis Downs2019/06/20 01:25 PM
              gather latencyanon2019/06/20 02:05 PM
                gather latencyTravis Downs2019/06/20 05:27 PM
                  gather latencyanon2019/06/21 01:31 AM
                    gather latencyMichael S2019/06/21 02:37 AM
                      gather latencyanon2019/06/21 04:20 AM
                        gather latencyMichael S2019/06/21 04:24 AM
                          gather latencyanon2019/06/21 04:48 AM
                            gather latencyTravis Downs2019/06/21 08:10 AM
                              gather latencyanon2019/06/21 08:50 AM
                                gather latencyMichael S2019/06/21 09:21 AM
                                  gather latencyanon2019/06/21 09:58 AM
                                gather latencyTravis Downs2019/06/21 09:39 AM
                                  gather latencyanon2019/06/21 10:16 AM
                                    gather latencyTravis Downs2019/06/21 11:51 AM
                                      gather latencyanon2019/06/21 01:38 PM
                                        gather latencyTravis Downs2019/06/21 01:53 PM
                                          gather latencyanon2019/06/21 02:44 PM
                                            You could be right, I am not sure (NT)Travis Downs2019/06/21 02:46 PM
                gather latencyTravis Downs2019/06/20 06:34 PM
                  gather latencyMichael S2019/06/21 02:45 AM
                    gather latencyTravis Downs2019/06/21 09:04 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?