POWER9 instruction latency

By: Travis Downs (travis.downs.delete@this.gmail.com), June 5, 2019 8:23 am
Room: Moderated Discussions
I recently ported uarch-bench to POWER (well really just made it portable) to test the store throughput, but first came across something weird.

To calculate CPU frequency in a portable way, I use a series of dependent add instructions, since pretty much every modern CPU executes add in a single cycle, right?

Well I was getting exactly half the expected speed from this calibration, and best as I can tell, POWER9 executes back-to-back adds with a latency of two cycles.

That is, this loop:

cc0: addi r3,r3,-1
cc4: addi r3,r3,-1
cc8: addi r3,r3,-1
ccc: addic. r3,r3,-1
cd0: bne cc0

Takes 8 cycles. It doesn't matter if you unroll it more or less: it takes 2 cycles per addi - so it's not an effect of the branch. It's no "about 2 cycles", it's "exactly two cycles", i.e., the measurement is very repeatable.

So does POWER9 really have a minimum latency of 2 cycles for any ALU instruction? There is a chance other integer instructions have a latency of one cycle, and addi is just lower, but it seems extremely unlikely: it has to be among the most common ALU ops and there is nothing that should make it slow.

Isn't a two-cycle latency going to be devastating to a lot of code, especially when compiled without that expectation? I guess POWER9 is really all-in on SMT then, at the cost of ST performance?
 Next Post in Thread >
TopicPosted ByDate
POWER9 instruction latencyTravis Downs2019/06/05 08:23 AM
  POWER9 instruction latencyMichael S2019/06/05 09:02 AM
    Thanks (NT)Travis Downs2019/06/05 03:15 PM
  POWER9 instruction latencyAM2019/06/05 09:09 AM
    POWER9 instruction latencyAM2019/06/05 09:29 AM
  POWER9 instruction latencynobody in particular2019/06/05 09:48 AM
  ISTR POWER4 also had 2cycle ALU latencyPaul A. Clayton2019/06/05 09:51 AM
    ISTR POWER4 also had 2cycle ALU latencyTravis Downs2019/06/05 03:12 PM
      ISTR POWER4 also had 2cycle ALU latencywumpus2019/06/06 06:29 AM
        ISTR POWER4 also had 2cycle ALU latencyanon2019/06/07 04:22 AM
          ISTR POWER4 also had 2cycle ALU latencydmcq2019/06/08 02:49 AM
            ISTR POWER4 also had 2cycle ALU latencydmcq2019/06/08 03:16 PM
  POWER9 instruction latencyGabriele Svelto2019/06/05 02:45 PM
    POWER9 instruction latencyTravis Downs2019/06/05 03:14 PM
      POWER9 instruction latencyMoris2019/06/06 08:00 AM
        POWER9 instruction latencyTravis Downs2019/06/06 09:04 AM
          POWER9 instruction latencyanon2019/06/07 04:11 AM
            POWER9 instruction latencyMichael S2019/06/07 04:29 AM
              POWER9 instruction latencyanon2019/06/07 04:41 AM
              POWER9 instruction latencyFoo_2019/06/07 09:06 AM
                POWER9 instruction latencyTravis Downs2019/06/07 11:47 AM
            POWER9 instruction latencyTravis Downs2019/06/07 11:45 AM
              POWER9 instruction latencyanon2019/06/26 06:35 AM
Reply to this Topic
Body: No Text
How do you spell purple?