RKL taken branch throughput

By: Chester (lamchester.delete@this.gmail.com), May 10, 2021 5:25 pm
Room: Moderated Discussions
Travis Downs (travis.downs.delete@this.gmail.com) on May 10, 2021 2:57 pm wrote:
> A worthwhile read on probing BTB behavior and size, including Intel, AMD and M1 chips:
>
> How many ifs are too many?
>
> One thing that caught my eye is that Marek measures better than one taken branch per
> cycle on Zen 3 (EPYC 7713), at least for code that fits in the L1 icache. That surprises
> me since I'm not aware of any mainstream uarch that can execute more than 1 taken branch
> per cycle (plenty can execute more than 1 untaken branches per cycle).
>
> Maybe it's just measurement error (e.g., due to turbo above
> the expected frequency), or can Zen 3 really do this?
>

Rocket Lake/11900K actually goes beyond Zen 3 here. Up to 8 branches, it can do two taken jumps per cycle (around 0.1 ns per jump). If there's only one branch (taken backward one at the end of the loop), or more than 8 branches, it's 1 taken jump per cycle. Once there are more than 256 branches in the loop, it climbs to 2 cycles per branch.

I wasn't able to replicate their Zen 3 result when I got someone to run it on a 5950X. I got one branch per cycle up to ~1024 branches (0.2 ns per branch, lining up with 5.05 GHz max boost), after which it increases to ~3 cycles per branch at 2048 branches. I didn't see any other CPU get more than 1 taken branch per cycle either.

I wrote my test well before this article came out (simple test, only forward unconditional jumps spaced out by 16 bytes except the loop branch which is taken backward), so it's not directly comparable. But from the graph it doesn't seem far below 1 so I suspect they didn't account for boost.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Post looking at BTB behavior and sizeTravis Downs2021/05/10 02:57 PM
  Post looking at BTB behavior and sizeAnon2021/05/10 04:43 PM
    Post looking at BTB behavior and sizeTravis Downs2021/05/10 08:59 PM
    Post looking at BTB behavior and sizeLinus Torvalds2021/05/11 10:13 AM
  RKL taken branch throughputChester2021/05/10 05:25 PM
    RKL taken branch throughputTravis Downs2021/05/10 09:00 PM
      RKL taken branch throughputChester2021/05/11 10:04 PM
        RKL taken branch throughputTravis Downs2021/05/14 10:34 PM
          RKL taken branch throughput---2021/05/15 10:07 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊