It's not about width in absolute bits. It's about duty cycle

By: Heikki Kultala (heikki.kultal.a.delete@this.gmail.com), August 29, 2022 2:28 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on August 28, 2022 5:50 pm wrote:
> anon2 (anon.delete@this.anon.com) on August 28, 2022 3:14 pm wrote:
> > --- (---.delete@this.redheron.com) on August 28, 2022 2:24 pm wrote:
> [snip]
> >> It wouldn't be absolutely crazy if you're trying to save energy (I wouldn't roll my eyes if
> >> I learned that Apple's small core likewise can Fetch up to 16 instructions a cycle -- might
> >> as well get as much useful as you can in one gulp, then sleep Fetch for two or three cycles);
> >
> > That seems like the opposite of good energy efficiency to me. I doubt the small
> > core would do that and also surprised about the big core if that is true of it.
>
> If I understand correctly, modern processes prefer 128-bit-wide SRAM arrays for area and
> power efficiency. This might allow a 4-instruction fetch to be energy efficient with two-wide
> decode compared to two 2-instruction fetches even with way memoization. Using partial tag
> way prediction without memoization would increase the per-access overhead.
>
> (A tiny core might even have a unified L1 cache, in which case wide fetch would free
> cycles for data accesses, though banking could provide parallel access. With a unified
> L1, full sub-array width accesses for instructions and data may be desirable.)
>
> Having fetch run ahead of decode may hide fetch glitches (e.g., two instructions may straddle
> sub-arrays, so one sub-array access per cycle for energy efficiency might only provide one instruction
> in one cycle, even some cache miss latency might be hidden be instruction buffering).
>
> If all of the instructions in a wide fetch were executed, a single
> wide fetch would (presumably) be more energy efficient.
>
> I would be skeptical that 64-byte fetch would be the most energy efficient or even have the best
> energy-delay; accessing multiple sub-arrays adds energy cost and there might be a smaller average
> fraction of used instructions with such wide fetch. (One might be able to use some of those instructions
> on branch mispredictions if they were stored in a small buffer, but branch mispredictions are relatively
> rare so I suspect such would not be worthwhile for energy efficiency.)
>
> It would be nice if someone with actual knowledge would chime in.

Alternating between used and unused cycles is worst for energy-efficiency; Going from active state to some zero state is toggling, and going back from zero state to active state is toggling.

Assuming totally random bits with 50% distribution, and pulling the bus to go to 0 when not utilized, utilizing a bus every other cycle and driving it zero every other cycle means equally much toggling than utilizing the bus every clock cycle (but gives only half the bandwidth).

Though if the old data is kept on the bus when idle, then the situation is not as bad.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureNobod2022/08/27 09:21 AM
  Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 10:35 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:04 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:05 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureBjörn Ragnar Björnsson2022/08/27 11:07 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:18 AM
        Typo, I meant Like nv denver (NT)Kara2022/08/27 11:19 AM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 12:06 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureavianes2022/08/28 05:59 AM
          Coarser-grained checkpointing/trackingPaul A. Clayton2022/08/28 09:56 AM
            Coarser-grained checkpointing/trackingavianes2022/08/29 05:02 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureKara2022/08/27 11:21 AM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureRayla2022/08/27 12:04 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureBjörn Ragnar Björnsson2022/08/27 12:30 PM
  Chips & Cheese analyzes Tachyum’s Revised Prodigy ArchitectureAnon2022/08/27 09:54 PM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureavianes2022/08/28 02:38 AM
    Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/28 02:24 PM
      Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/28 03:14 PM
        Energy cost of fetch width?Paul A. Clayton2022/08/28 05:50 PM
          It's not about width in absolute bits. It's about duty cycleHeikki Kultala2022/08/29 02:28 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/29 09:53 AM
          Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/29 02:26 PM
        Chips & Cheese analyzes Tachyum’s Revised Prodigy Architecture---2022/08/29 10:11 AM
          Chips & Cheese analyzes Tachyum’s Revised Prodigy Architectureanon22022/08/29 04:00 PM
            or patentsChester2022/08/29 09:54 PM
              or patentsanon22022/08/29 10:54 PM
                or patentsChester2022/08/29 11:37 PM
                  or patentsAnon2022/08/29 11:46 PM
                  or patentsanon22022/08/30 01:35 AM
                    or patentsChester2022/08/30 02:07 PM
              or patents---2022/08/30 11:29 AM
                or patentsChester2022/08/30 07:29 PM
                  or patents---2022/08/31 10:44 AM
                    or patentsUngo2022/08/31 01:10 PM
                      or patents---2022/08/31 04:01 PM
                        or patentsChester2022/08/31 07:05 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊