or patents

By: Chester (lamchester.delete@this.gmail.com), August 30, 2022 2:07 pm
Room: Moderated Discussions
> If you have some L0 structure to minimize pipeline bubble on taken, yes (which presumably
> everyone has). Such a thing has to predict address as well, it's not really different
> from the BTB in a decoupled fetch pipeline, in terms of what it predicts.

Yeah, agreed.

And not everyone - some older CPUs cannot do branches back to back (i.e. AMD Athlon) - the BTB attached to the L1i is used for every taken branch, and there's a 1 cycle penalty in the frontend for a taken branch.

> The advantage
> > is you don't have to index into a separate BTB structure to fetch the branch target if you predict it's
> > taken. You get the predicted branch target along with the L1i fetch, no other lookup needed.
> >
> > The downside is you have no clue where to go next once you miss L1i. Branches are pretty
> > common, so if there is a taken branch coming up, you won't be able to follow it. With a
> > decoupled BTB, you can still index into that even if your L1i fetch missed, and prefetch
> > far enough to cover L2 latency - assuming your branch predictor is reasonably accurate.
>
> Right. You could also accommodate that with an I$ prefetch prediction structure that could be
> far cheaper than a branch predictor. You don't have to predict every single branch, only I$ misses
> which might be fewer by an order of magnitude. So with an L2 latency about par with mispredict
> penalty, you could do nothing and that might be ~equivalent penalty as 90% accurate branch predictor
> already. Might not take too much to get it up high enough to a state of the art branch predictor.

Like a data-side prefetcher? As in, it looks at incoming cache lookup requests, and generates additional lookups/fill requests based on access pattern?

Probably not, because data-side prefetch tends to be pretty wasteful in terms of bandwidth, suggesting poor accuracy. Branch predictors are already typically well over 90% accurate. If you have that already, it makes sense to just use the predictor and get really accurate prefetch. And if you don't have a branch predictor with at least early 2000s accuracy levels, well that needs to be fixed.

> Especially with very big I$ and big fast L2 and close memory as Apple has.
> IOW, decoupled fetch with big BTBs may not *be* the next step for Apple.

Agreed, decoupled fetch (or rather, decoupled PC generation) is far from the only way to go. Their L2 is still higher latency in both absolute time and cycle counts than most L2 caches on AMD and Intel CPUs, so a L1i miss is expensive for them. But their L1i is so gigantic that the L1i miss case should be pretty rare. If it's more expensive, that's fine.

> > > This allow equivalent
> > > accuracy with smaller structures. Downside being you don't get I$ prefetch prediction from the same
> > > structure, and you couldn't avoid taken branch bubbles with high frequency. But that doesn't mean
> > > you can't have a prefetch prediction from other structures, as perhaps POWER9 does.
> >
> > Yeah, branch targets stored alongside L1i (coupled BTB) tend to take more than 1 cycle latency to access.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureNobod2022/08/27 09:21 AM
  Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 10:35 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:04 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:05 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureBjörn Ragnar Björnsson2022/08/27 11:07 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:18 AM
        Typo, I meant Like nv denver (NT)Kara2022/08/27 11:19 AM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 12:06 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureavianes2022/08/28 05:59 AM
          Coarser-grained checkpointing/trackingPaul A. Clayton2022/08/28 09:56 AM
            Coarser-grained checkpointing/trackingavianes2022/08/29 05:02 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:21 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 12:04 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureBjörn Ragnar Björnsson2022/08/27 12:30 PM
  Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureAnon2022/08/27 09:54 PM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureavianes2022/08/28 02:38 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/28 02:24 PM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/28 03:14 PM
        Energy cost of fetch width?Paul A. Clayton2022/08/28 05:50 PM
          It's not about width in absolute bits. It's about duty cycleHeikki Kultala2022/08/29 02:28 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/29 09:53 AM
          Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/29 02:26 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/29 10:11 AM
          Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/29 04:00 PM
            or patentsChester2022/08/29 09:54 PM
              or patentsanon22022/08/29 10:54 PM
                or patentsChester2022/08/29 11:37 PM
                  or patentsAnon2022/08/29 11:46 PM
                  or patentsanon22022/08/30 01:35 AM
                    or patentsChester2022/08/30 02:07 PM
              or patents---2022/08/30 11:29 AM
                or patentsChester2022/08/30 07:29 PM
                  or patents---2022/08/31 10:44 AM
                    or patentsUngo2022/08/31 01:10 PM
                      or patents---2022/08/31 04:01 PM
                        or patentsChester2022/08/31 07:05 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊