5-6 wide core, why no mention from Intel?

By: Maynard Handley (name99.delete@this.name99.org), October 2, 2015 9:45 am
Room: Moderated Discussions
Exophase (exophase.delete@this.gmail.com) on October 2, 2015 7:43 am wrote:
> Maynard Handley (name99.delete@this.name99.org) on October 2, 2015 1:10 am wrote:
> > "the only catch is supporting the two register destinations."
> > And that may be necessary anyway depending on how you support the "S-suffix" instructions (those that
> > also set the zero/overflow/etc flags). You can crack those this PPC did, but if you've designed them
> > properly (as I assume ARM did for v8, learning from PPC's mistakes) the natural high performance thing
> > would be to have a pool of renamed 4-bit flag registers, use the normal rename channels, and just accept
> > that some largish fraction (20% or so?) of your instructions are going to be two destination. (Once
> > you have this machinery, you may also be able to use it to fuse instruction pairs that are common but
> > each generate a separate output if there are cases where that's worth the hassle.)
> >
>
> Why use the normal rename channels for a separate register file where there's little
> cross-access with the main register file? It's the same thing for FP/SIMD, you're generally
> better off renaming to a different register file with separate write ports.
>
> Flag setting instructions on ARMv8 AArch64 are limited to ADCS, SBCS, ADDS, SUBS, ANDS, BICS, CCMP, and CCMN.
> Definitely nothing like 20% or so. If AArch32 is supported that opens up a much larger pool. But either way,
> CMP/TST like instructions are common enough that you really don't want to have to crack them in two.

My 20% referred to the full set of instructions that generate two results.
This is not just the S-suffix instructions but also the load-pairs and the load+address+update instructions.

In principle this can be learned (for all the ARM-v8 CPUs). Start by seeing whether you can execute one vs 2 load pairs/cycle in a tight loop, then stuff in extra unrelated arithmetic to see whether you can reach the width you expect, or get blocked by not enough destination register write ports. Same thing with load-update instructions and S-suffix instructions.
The one thing I could find on this is A72 on average executes 1.08 micro-ops per instruction, which seems low if the entire set I've suggested above are all cracked; but maybe it's plausible if, eg, load-update (and some other instructions we haven't mentioned) are cracked but S-suffix and load/store pair are not.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Update to Intel Optimization ManualSHK2015/09/29 05:38 AM
  gather speedEric Bron2015/09/29 09:43 AM
    gather speedGabriele Svelto2015/09/29 12:00 PM
  Update to Intel Optimization ManualTim McCaffrey2015/09/29 11:18 AM
    Update to Intel Optimization ManualSHK2015/09/29 12:04 PM
      Update to Intel Optimization ManualAnon2015/09/29 02:23 PM
    Update to Intel Optimization Manualnone2015/09/29 10:31 PM
      Update to Intel Optimization ManualMichael S2015/09/30 04:24 AM
    Update to Intel Optimization ManualMichael S2015/09/30 04:30 AM
      Update to Intel Optimization ManualTim McCaffrey2015/09/30 10:01 AM
  5-6 wide core, why no mention from Intel?Wouter Tinus2015/09/30 02:14 PM
    5-6 wide core, why no mention from Intel?Maynard Handley2015/09/30 03:30 PM
      5-6 wide core, why no mention from Intel?Alberto2015/10/01 12:13 AM
        5-6 wide core, why no mention from Intel?anon2015/10/01 02:21 AM
          5-6 wide core, why no mention from Intel?Alberto2015/10/01 04:41 AM
            5-6 wide core, why no mention from Intel?anon2015/10/01 05:27 AM
              5-6 wide core, why no mention from Intel?Alberto2015/10/01 08:33 AM
                5-6 wide core, why no mention from Intel?juanrga2015/10/01 10:24 AM
        5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 08:57 AM
    5-6 wide core, why no mention from Intel?juanrga2015/10/01 03:59 AM
      5-6 wide core, why no mention from Intel?Wouter Tinus2015/10/01 02:48 PM
        5-6 wide core, why no mention from Intel?juanrga2015/10/03 03:17 AM
          5-6 wide core, why no mention from Intel?Wouter Tinus2015/10/03 11:19 AM
            Are you kidding? (NT)juanrga2015/10/04 05:30 AM
              Are you kidding?Wouter Tinus2015/10/04 03:18 PM
                Are you kidding?juanrga2015/10/05 09:46 AM
                  Are you kidding?David Kanter2015/10/05 11:24 AM
                    Are you kidding?anon2015/10/05 09:26 PM
                    Are you kidding?Linus Torvalds2015/10/07 04:49 AM
                      Are you kidding?juanrga2015/10/07 10:46 AM
                        Are you kidding?anon2015/10/07 06:21 PM
                  Are you kidding?Wouter Tinus2015/10/05 01:25 PM
                    Are you kidding?juanrga2015/10/06 10:17 AM
                      Are you kidding?Stubabe2015/10/07 12:17 AM
                        Are you kidding?juanrga2015/10/07 10:56 AM
                          Amazing...Wouter Tinus2015/10/07 11:31 AM
                            Amazing...juanrga2015/10/07 03:45 PM
                          Are you kidding?Stubabe2015/10/07 11:57 AM
                            Are you kidding?juanrga2015/10/07 03:59 PM
                          Are you kidding?Wilco2015/10/07 02:07 PM
                            Are you kidding?juanrga2015/10/07 04:33 PM
      5-6 wide core, why no mention from Intel?Eric Bron2015/10/04 04:18 AM
    5-6 wide core, why no mention from Intel?David Kanter2015/10/01 09:01 AM
      Optimal number and kind of execution unitsjuanrga2015/10/01 10:50 AM
        Optimal number and kind of execution unitsPatrick Chase2015/10/01 04:38 PM
          Optimal number and kind of execution unitsI.S.T.2015/10/01 05:10 PM
            Optimal number and kind of execution unitsPatrick Chase2015/10/01 11:39 PM
          Optimal number and kind of execution unitsExophase2015/10/01 10:11 PM
          Optimal number and kind of execution unitsjuanrga2015/10/02 05:14 AM
      LD/ST unitsSHK2015/10/01 11:11 AM
        LD/ST unitsDavid Kanter2015/10/01 12:54 PM
          LD/ST unitsSHK2015/10/02 04:55 AM
            LD/ST unitsJukka Larja2015/10/02 09:49 PM
        LD/ST unitsMaynard Handley2015/10/01 01:01 PM
          LD/ST unitsanon2015/10/01 09:54 PM
      5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 12:57 PM
        5-6 wide core, why no mention from Intel?David Kanter2015/10/01 03:49 PM
          5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 06:21 PM
          5-6 wide core, why no mention from Intel?Exophase2015/10/01 10:07 PM
            5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 12:10 AM
              5-6 wide core, why no mention from Intel?Megol2015/10/02 03:39 AM
                5-6 wide core, why no mention from Intel?Michael S2015/10/02 04:27 AM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 09:37 AM
                  5-6 wide core, why no mention from Intel?noko2015/10/02 05:19 PM
              5-6 wide core, why no mention from Intel?Exophase2015/10/02 06:43 AM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 09:45 AM
                  5-6 wide core, why no mention from Intel?Exophase2015/10/02 10:23 AM
          5-6 wide core, why no mention from Intel?Wilco2015/10/02 12:48 PM
            5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 01:25 PM
              5-6 wide core, why no mention from Intel?Wilco2015/10/02 02:26 PM
              5-6 wide core, why no mention from Intel?noko2015/10/02 05:45 PM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 06:54 PM
            5-6 wide core, why no mention from Intel?David Kanter2015/10/02 01:59 PM
              5-6 wide core, why no mention from Intel?Wilco2015/10/02 02:59 PM
                5-6 wide core, why no mention from Intel?David Kanter2015/10/02 03:15 PM
                  5-6 wide core, why no mention from Intel?Wilco2015/10/02 04:06 PM
                    LDP/STP usage in AArch64 for 403.gccnone2015/10/03 01:04 AM
                      LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 03:02 AM
                        LDP/STP usage in AArch64 for 403.gccnone2015/10/03 03:11 AM
                          LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 03:37 AM
                            LDP/STP usage in AArch64 for 403.gccnone2015/10/03 04:37 AM
                              LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 05:26 AM
                  5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 04:24 PM
              5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 03:07 PM
  Update to Intel Optimization Manualanon2015/09/30 04:43 PM
  Update to Intel Optimization ManualPatrick Chase2015/09/30 09:44 PM
    Update to Intel Optimization Manualanon2015/09/30 10:49 PM
    Update to Intel Optimization Manualnone2015/09/30 10:50 PM
    Update to Intel Optimization ManualDavid Kanter2015/10/01 12:52 PM
      Update to Intel Optimization ManualPatrick Chase2015/10/01 04:16 PM
        Update to Intel Optimization Manualanon2015/10/01 10:45 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?