or patents

By: anon2 (anon.delete@this.anon.com), August 30, 2022 1:35 am
Room: Moderated Discussions
Chester (lamchester.delete@this.gmail.com) on August 29, 2022 11:37 pm wrote:
> > POWER9 seems to do similar. I'm sure the PA6T PowerPC heritage
> > link is purely coincidental. From OpenPOWER user
> > manual: "As instructions are fetched, they are scanned
> > for branches. Up to eight branches are simultaneously
> > processed by the branch prediction logic that predicts both
> > the direction and/or target of the branches, depending
> > on the branch type." It does have some kind of L0 predictor
> > (BTAC) which does predict addresses as well though,
> > unclear how works but it is very fast so not like huge multi level target buffer of x86 CPUs.
>
> Seems common to have both. For example, Cortex A72 seems to have a decoupled 64 entry L1 BTB.
> After that, it seems to have a second level BTB coupled to the L1i, that can track up to 4096
> branch targets as the branches don't spill out of the 48 KB L1i. So if there's a branch every
> 16 bytes, it can track 3072 targets. Or 768 if there's a branch every 64 bytes, and so on.
>
> If a branch target comes out of the L1 BTB, there's a one cycle penalty.
> If it comes out of the second level BTB/L1i, there's a 2 cycle penalty.
>
> > It makes sense that Apple really likes a very large I$ if they are doing coupled fetch. And coupled
> > fetch has real benefits, you don't have to predict the presence of a branch, and you don't have
> > to predict target for direct branches (which should be the large majority).
>
> You still have to predict the presence of a branch and predict that it's going to be taken.

If you have some L0 structure to minimize pipeline bubble on taken, yes (which presumably everyone has). Such a thing has to predict address as well, it's not really different from the BTB in a decoupled fetch pipeline, in terms of what it predicts.

The advantage
> is you don't have to index into a separate BTB structure to fetch the branch target if you predict it's
> taken. You get the predicted branch target along with the L1i fetch, no other lookup needed.
>
> The downside is you have no clue where to go next once you miss L1i. Branches are pretty
> common, so if there is a taken branch coming up, you won't be able to follow it. With a
> decoupled BTB, you can still index into that even if your L1i fetch missed, and prefetch
> far enough to cover L2 latency - assuming your branch predictor is reasonably accurate.

Right. You could also accommodate that with an I$ prefetch prediction structure that could be far cheaper than a branch predictor. You don't have to predict every single branch, only I$ misses which might be fewer by an order of magnitude. So with an L2 latency about par with mispredict penalty, you could do nothing and that might be ~equivalent penalty as 90% accurate branch predictor already. Might not take too much to get it up high enough to a state of the art branch predictor. Especially with very big I$ and big fast L2 and close memory as Apple has.

IOW, decoupled fetch with big BTBs may not *be* the next step for Apple.

>
> > This allow equivalent
> > accuracy with smaller structures. Downside being you don't get I$ prefetch prediction from the same
> > structure, and you couldn't avoid taken branch bubbles with high frequency. But that doesn't mean
> > you can't have a prefetch prediction from other structures, as perhaps POWER9 does.
>
> Yeah, branch targets stored alongside L1i (coupled BTB) tend to take more than 1 cycle latency to access.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureNobod2022/08/27 09:21 AM
  Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 10:35 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:04 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:05 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureBjörn Ragnar Björnsson2022/08/27 11:07 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:18 AM
        Typo, I meant Like nv denver (NT)Kara2022/08/27 11:19 AM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 12:06 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureavianes2022/08/28 05:59 AM
          Coarser-grained checkpointing/trackingPaul A. Clayton2022/08/28 09:56 AM
            Coarser-grained checkpointing/trackingavianes2022/08/29 05:02 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:21 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 12:04 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureBjörn Ragnar Björnsson2022/08/27 12:30 PM
  Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureAnon2022/08/27 09:54 PM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureavianes2022/08/28 02:38 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/28 02:24 PM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/28 03:14 PM
        Energy cost of fetch width?Paul A. Clayton2022/08/28 05:50 PM
          It's not about width in absolute bits. It's about duty cycleHeikki Kultala2022/08/29 02:28 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/29 09:53 AM
          Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/29 02:26 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/29 10:11 AM
          Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/29 04:00 PM
            or patentsChester2022/08/29 09:54 PM
              or patentsanon22022/08/29 10:54 PM
                or patentsChester2022/08/29 11:37 PM
                  or patentsAnon2022/08/29 11:46 PM
                  or patentsanon22022/08/30 01:35 AM
                    or patentsChester2022/08/30 02:07 PM
              or patents---2022/08/30 11:29 AM
                or patentsChester2022/08/30 07:29 PM
                  or patents---2022/08/31 10:44 AM
                    or patentsUngo2022/08/31 01:10 PM
                      or patents---2022/08/31 04:01 PM
                        or patentsChester2022/08/31 07:05 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊