Performance per single clock cycle

By: hobold (hobold.delete@this.vectorizer.org), June 16, 2022 7:42 am
Room: Moderated Discussions
dmcq (dmcq.delete@this.fano.co.uk) on June 16, 2022 6:59 am wrote:
> hobold (hobold.delete@this.vectorizer.org) on June 16, 2022 5:12 am wrote:

> > I was wondering for a while if maybe Apple designed a processor that can sometimes
> > execute two serially dependent instructions within one longer clock cycle.
> >
> > How much % of cycle time is latch overhead these days? What if instead of the usual beat
> > "latch work latch work latch" you built for "latch work work latch work work latch"?
> >
> > One probably wouldn't even try to make this work for arbitrary sequences of two dependent instructions.
> > But maybe a small subset of dependent pairs is statistically dominant enough to focus on?
>
> That's what fused ops are about. RISC-V is dependent on fused operations to get decent performance.

I am aware of those. But we don't build the pipeline around them, but execute the parts in distinct subsequent clock cycles; the benefits here are other ones than what I was trying to describe.

Hmm, maybe some of you have heard of an ancient academic precedent called "weird machine"? Those goals were different again, trying for some limited superscalarism, back when transistor budget was barely sufficient for a fully pipelined processor. But in that old design, operand forwarding was implicit between first half and second half of a dependent instruction pair. And the pipeline timing was still centered around executing one (of two dependent) operation in one clock cycle. IIRC, instruction pairs were specified in a long instruction word, i.e. the task of identifying suitable dependent pairs was the compiler's job.

I was thinking along similar lines of trying to reduce overhead. Latch overhead (but I don't know if that is even significant) as well as operand routing overhead. There is a price, of course, so I am not trying to sell a silver bullet.

I guess one could describe my half baked suggestion as "executing traces of length two" in a single longer cycle. With such a vague characterization, the similarities to the old "weird machine" are more directly apparent. But the weird machine did it in two overlapping cycles, trying to approach the benefits of a second, parallel, pipeline, for the cost of adding only one extra pipeline stage.


Apple's ARM does get a suspiciously high amount of single threaded integer spaghetti code executed per second, at two thirds the clock speed of contemporary 'x86 cores. Apple has a bit of a transistor advantage, too, (as TSMC's trailblazing customer) so it seems clear that Apple's processor was intentionally designed for slower, longer clock cycles.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
M2 benchmarks-2022/06/15 12:27 PM
  You mean "absurd ARM"? ;-) (NT)Rayla2022/06/15 02:18 PM
    It has PPC heritage :) (NT)anon22022/06/15 02:55 PM
      Performance per clock2022/06/15 03:05 PM
        Performance per single clock cyclehobold2022/06/16 05:12 AM
          Performance per single clock cycledmcq2022/06/16 06:59 AM
            Performance per single clock cyclehobold2022/06/16 07:42 AM
          Performance per single clock cycleDoug S2022/06/16 09:39 AM
            Performance per single clock cyclehobold2022/06/16 12:36 PM
            More like cascaded ALUsPaul A. Clayton2022/06/16 01:13 PM
              SuperSPARC ALUMark Roulo2022/06/16 01:57 PM
                LEABrett2022/06/16 02:52 PM
  M2 benchmarksDaveC2022/06/15 03:31 PM
    M2 benchmarksanon22022/06/15 05:06 PM
    M2 benchmarks2022/06/15 07:21 PM
    M2 benchmarks---2022/06/15 07:33 PM
  M2 benchmarksAdrian2022/06/15 10:11 PM
    M2 benchmarksEric Fink2022/06/16 12:07 AM
      M2 benchmarksAdrian2022/06/16 02:09 AM
        M2 benchmarksEric Fink2022/06/16 05:46 AM
          M2 benchmarksAdrian2022/06/16 09:27 AM
            M2 benchmarks---2022/06/16 10:08 AM
              M2 benchmarksAdrian2022/06/16 11:43 AM
                M2 benchmarksDummond D. Slow2022/06/16 01:03 PM
                  M2 benchmarksAdrian2022/06/17 03:34 AM
                    M2 benchmarksDummond D. Slow2022/06/17 07:35 AM
            M2 benchmarksnone2022/06/16 10:14 AM
              M2 benchmarksAdrian2022/06/16 12:44 PM
            M2 benchmarksEric Fink2022/06/17 02:05 AM
        M2 benchmarksAnon2022/06/16 06:28 AM
          M2 benchmarks => MTAdrian2022/06/16 11:04 AM
            M2 benchmarks => MTAnon2022/06/18 02:38 AM
              M2 benchmarks => MTAdrian2022/06/18 03:25 AM
                M2 benchmarks => MT---2022/06/18 10:14 AM
      M2 benchmarksDoug S2022/06/16 09:49 AM
        M2 Pro at 3nmEric Fink2022/06/17 02:51 AM
    M2 benchmarksSean M2022/06/16 01:00 AM
      M2 benchmarksDoug S2022/06/16 09:56 AM
        M2 benchmarksjoema2022/06/16 01:28 PM
          M2 benchmarksSean M2022/06/16 02:53 PM
            M2 benchmarksDoug S2022/06/16 09:19 PM
              M2 benchmarksDoug S2022/06/16 09:21 PM
                M2 benchmarks---2022/06/16 10:53 PM
                  M2 benchmarksDoug S2022/06/17 12:37 AM
                  Apple’s STEM AmbitionsSean M2022/06/17 04:18 AM
                    Apple’s STEM Ambitions---2022/06/17 09:33 AM
                      Mac Pro with Nvidia H100Tony Wu2022/06/17 06:37 PM
                        Mac Pro with Nvidia H100Doug S2022/06/17 10:37 PM
                          Mac Pro with Nvidia H100Tony Wu2022/06/18 06:49 AM
                            Mac Pro with Nvidia H100Dan Fay2022/06/18 07:40 AM
                          Mac Pro with Nvidia H100Anon42022/06/20 09:04 AM
                            Mac Pro with Nvidia H100Simon Farnsworth2022/06/20 10:09 AM
                              Mac Pro with Nvidia H100Doug S2022/06/20 10:32 AM
                                Mac Pro with Nvidia H100Simon Farnsworth2022/06/20 11:20 AM
                              Mac Pro with Nvidia H100Anon42022/06/20 04:16 PM
                            Mac Pro with Nvidia H100Doug S2022/06/20 10:19 AM
                        Mac Pro with Nvidia H100me2022/06/18 07:17 AM
                          Mac Pro with Nvidia H100Tony Wu2022/06/18 09:28 AM
                            Mac Pro with Nvidia H100me2022/06/19 10:08 AM
                              Mac Pro with Nvidia H100Dummond D. Slow2022/06/19 10:51 AM
                                Mac Pro with Nvidia H100Elliott H2022/06/19 06:39 PM
                            Mac Pro with Nvidia H100Doug S2022/06/19 06:16 PM
                              Mac Pro with Nvidia H100---2022/06/19 06:56 PM
                                Mac Pro with Nvidia H100Sam G2022/06/19 11:00 PM
                                  Mac Pro with Nvidia H100---2022/06/20 06:25 AM
                                    Mac Pro with Nvidia H100anon52022/06/20 08:41 AM
                                      Mac Pro with Nvidia H100Sam G2022/06/20 07:22 PM
                                    Mac Pro with Nvidia H100Sam G2022/06/20 07:13 PM
                                      Mac Pro with Nvidia H100Doug S2022/06/20 10:19 PM
                                        Mac Pro with Nvidia H100Sam G2022/06/22 12:06 AM
                                          Mac Pro with Nvidia H100Doug S2022/06/22 09:18 AM
                                  Mac Pro with Nvidia H100Doug S2022/06/20 10:38 AM
                                    Mac Pro with Nvidia H100Sam G2022/06/20 07:17 PM
                              Mac Pro with Nvidia H100Dummond D. Slow2022/06/20 05:46 PM
                      Apple’s STEM Ambitionsnoko2022/06/17 07:32 PM
                      Quick aside: huge pages also useful for nested page tables (virtualization) (NT)Paul A. Clayton2022/06/18 06:28 AM
                        Quick aside: huge pages also useful for nested page tables (virtualization)---2022/06/18 10:16 AM
          Not this nonsense againAnon2022/06/16 03:06 PM
            Parallel video encodingWes Felter2022/06/16 04:57 PM
              Parallel video encodingDummond D. Slow2022/06/16 07:16 PM
                Parallel video encodingWes Felter2022/06/16 07:49 PM
              Parallel video encoding---2022/06/16 07:41 PM
                Parallel video encodingDummond D. Slow2022/06/16 10:08 PM
                  Parallel video encoding---2022/06/16 11:03 PM
                    Parallel video encodingDummond D. Slow2022/06/17 07:45 AM
            Not this nonsense againjoema2022/06/16 09:13 PM
              Not this nonsense again---2022/06/16 11:18 PM
  M2 benchmarks-DDR4 vs DDR5Per Hesselgren2022/06/16 01:09 AM
    M2 benchmarks-DDR4 vs DDR5Rayla2022/06/16 08:12 AM
      M2 benchmarks-DDR4 vs DDR5Doug S2022/06/16 09:58 AM
        M2 benchmarks-DDR4 vs DDR5Rayla2022/06/16 11:58 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?