or patents

By: Chester (lamchester.delete@this.gmail.com), August 29, 2022 9:54 pm
Room: Moderated Discussions
> > You clearly have no clue the lengths Apple go to to save energy in Fetch.
>
> No it's more that you don't understand physical design. Building wider machinery so you
> can do more work in one cycle so you can "sleep" for a few cycles is not a good thing. The
> "race to idle" idea you might be basing it on operates on utterly different scales.
>

Or that reading patents is not the same as showing something is in use in a design. Companies patent things all the time just in case they might employ the strategy. That doesn't mean they're using it or will use it any time in the future, and doesn't mean anyone else will use it either. Also, patents are often extremely vague. That's partially so lawyers have a ton of room to claim patent infringement. It also makes them worthless for claiming a certain microarchitecture detail exists.

If you want to claim that "Apple Fetch prediction predicts the trace width, not just the trace address", your post needs to include more detail than just patents. Ditto for whether a loop buffer or L0 icache exists.

> BTW In spite of Apple Fetch being so advanced, they actually remains a lot they can still do!
> While they were early adopters of Decoupled Fetch (as has I think, *everyone* nowadays, eventually),
> they only have "first stage" decoupling, withe the pipeline looking like
> [Fetch Address Predict] -> [Fetch cache Access] -> Queue of Instructions -> [Decode].
> They have not adopted the next step (neither has anyone else yet?), as suggested in Glenn Reinmann's thesis, of
> [Fetch Address Predict] -> Queue of predicted addresses -> [Fetch cache Access] -> Queue of Instructions -> [Decode].

Pretty sure the second is what everyone does in their high performance designs, except for Apple. For the past decade, and more. Apple actually seems to be unique in doing [fetch address predict] -> [fetch cache access] -> [fetch result drives next prediction] for their main BTB level, based on how branch latency jumps as the loop exceeds L1i size.

Even Sandy Bridge does [Fetch Address Predict] -> Queue of predicted addresses -> [Fetch cache Access] -> Queue of Instructions -> [Decode], as branch latency doesn't substantially increase as the loop exceeds L1i capacity. It so happens that Apple using the former method doesn't matter much - they have such a honking huge L1i that the miss case is likely substantially rarer than on other architectures.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureNobod2022/08/27 09:21 AM
  Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 10:35 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:04 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:05 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureBjörn Ragnar Björnsson2022/08/27 11:07 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:18 AM
        Typo, I meant Like nv denver (NT)Kara2022/08/27 11:19 AM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 12:06 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureavianes2022/08/28 05:59 AM
          Coarser-grained checkpointing/trackingPaul A. Clayton2022/08/28 09:56 AM
            Coarser-grained checkpointing/trackingavianes2022/08/29 05:02 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:21 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 12:04 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureBjörn Ragnar Björnsson2022/08/27 12:30 PM
  Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureAnon2022/08/27 09:54 PM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureavianes2022/08/28 02:38 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/28 02:24 PM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/28 03:14 PM
        Energy cost of fetch width?Paul A. Clayton2022/08/28 05:50 PM
          It's not about width in absolute bits. It's about duty cycleHeikki Kultala2022/08/29 02:28 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/29 09:53 AM
          Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/29 02:26 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/29 10:11 AM
          Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/29 04:00 PM
            or patentsChester2022/08/29 09:54 PM
              or patentsanon22022/08/29 10:54 PM
                or patentsChester2022/08/29 11:37 PM
                  or patentsAnon2022/08/29 11:46 PM
                  or patentsanon22022/08/30 01:35 AM
                    or patentsChester2022/08/30 02:07 PM
              or patents---2022/08/30 11:29 AM
                or patentsChester2022/08/30 07:29 PM
                  or patents---2022/08/31 10:44 AM
                    or patentsUngo2022/08/31 01:10 PM
                      or patents---2022/08/31 04:01 PM
                        or patentsChester2022/08/31 07:05 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊