By: Travis Downs (travis.downs.delete@this.gmail.com), May 10, 2021 2:57 pm
Room: Moderated Discussions
A worthwhile read on probing BTB behavior and size, including Intel, AMD and M1 chips:
How many ifs are too many?
One thing that caught my eye is that Marek measures better than one taken branch per cycle on Zen 3 (EPYC 7713), at least for code that fits in the L1 icache. That surprises me since I'm not aware of any mainstream uarch that can execute more than 1 taken branch per cycle (plenty can execute more than 1 untaken branches per cycle).
Maybe it's just measurement error (e.g., due to turbo above the expected frequency), or can Zen 3 really do this?
How many ifs are too many?
One thing that caught my eye is that Marek measures better than one taken branch per cycle on Zen 3 (EPYC 7713), at least for code that fits in the L1 icache. That surprises me since I'm not aware of any mainstream uarch that can execute more than 1 taken branch per cycle (plenty can execute more than 1 untaken branches per cycle).
Maybe it's just measurement error (e.g., due to turbo above the expected frequency), or can Zen 3 really do this?
Topic | Posted By | Date |
---|---|---|
Post looking at BTB behavior and size | Travis Downs | 2021/05/10 02:57 PM |
Post looking at BTB behavior and size | Anon | 2021/05/10 04:43 PM |
Post looking at BTB behavior and size | Travis Downs | 2021/05/10 08:59 PM |
Post looking at BTB behavior and size | Linus Torvalds | 2021/05/11 10:13 AM |
RKL taken branch throughput | Chester | 2021/05/10 05:25 PM |
RKL taken branch throughput | Travis Downs | 2021/05/10 09:00 PM |
RKL taken branch throughput | Chester | 2021/05/11 10:04 PM |
RKL taken branch throughput | Travis Downs | 2021/05/14 10:34 PM |
RKL taken branch throughput | --- | 2021/05/15 10:07 AM |