5-6 wide core, why no mention from Intel?

By: Megol (golem960.delete@this.gmail.com), October 2, 2015 3:39 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on October 2, 2015 1:10 am wrote:
> Exophase (exophase.delete@this.gmail.com) on October 1, 2015 11:07 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on October 1, 2015 4:49 pm wrote:
> > > For example, what happens if the pair loads target different pages?
> > > You'd need to do two separate translations through the TLB.
> >
> > You make this relatively unusual case take 2-3 cycles.
> >
> > Having a similar penalty for cacheline-crossing loads is still not a huge detriment. Even if you limited
> > single-cycle performance to naturally aligned boundaries you would still get significantly better bang
> > for your buck vs not having the instruction at all. And since you need decent 64/128-bit load support
> > for SIMD it's kind of a given, the only catch is supporting the two register destinations.
>
> "the only catch is supporting the two register destinations."
> And that may be necessary anyway depending on how you support the "S-suffix" instructions (those that
> also set the zero/overflow/etc flags). You can crack those this PPC did, but if you've designed them
> properly (as I assume ARM did for v8, learning from PPC's mistakes) the natural high performance thing
> would be to have a pool of renamed 4-bit flag registers, use the normal rename channels, and just accept
> that some largish fraction (20% or so?) of your instructions are going to be two destination. (Once
> you have this machinery, you may also be able to use it to fuse instruction pairs that are common but
> each generate a separate output if there are cases where that's worth the hassle.)

Why would your scheme be more "proper" than the PPC one?
Increasing register targets per instruction is expensive, not just with register write ports (not that a power optimized design is likely to support the theoretical register write peak, register port reduction techniques are common) but also in the bypass network etc.

I don't remember if AARCH64 support split conditions (parts of the flag result come from several instructions) but if they did their lesson and don't one of the most inexpensive ways to handle condition flags are attaching them to registers. Then the complications evaporate with very little extra state (n extra bits per physical register for flags, keeping track of the register storing the current condition in the renamer) and little overhead.

But for load pair instructions? The reasonable way to handle them is cracking at decode.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Update to Intel Optimization ManualSHK2015/09/29 05:38 AM
  gather speedEric Bron2015/09/29 09:43 AM
    gather speedGabriele Svelto2015/09/29 12:00 PM
  Update to Intel Optimization ManualTim McCaffrey2015/09/29 11:18 AM
    Update to Intel Optimization ManualSHK2015/09/29 12:04 PM
      Update to Intel Optimization ManualAnon2015/09/29 02:23 PM
    Update to Intel Optimization Manualnone2015/09/29 10:31 PM
      Update to Intel Optimization ManualMichael S2015/09/30 04:24 AM
    Update to Intel Optimization ManualMichael S2015/09/30 04:30 AM
      Update to Intel Optimization ManualTim McCaffrey2015/09/30 10:01 AM
  5-6 wide core, why no mention from Intel?Wouter Tinus2015/09/30 02:14 PM
    5-6 wide core, why no mention from Intel?Maynard Handley2015/09/30 03:30 PM
      5-6 wide core, why no mention from Intel?Alberto2015/10/01 12:13 AM
        5-6 wide core, why no mention from Intel?anon2015/10/01 02:21 AM
          5-6 wide core, why no mention from Intel?Alberto2015/10/01 04:41 AM
            5-6 wide core, why no mention from Intel?anon2015/10/01 05:27 AM
              5-6 wide core, why no mention from Intel?Alberto2015/10/01 08:33 AM
                5-6 wide core, why no mention from Intel?juanrga2015/10/01 10:24 AM
        5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 08:57 AM
    5-6 wide core, why no mention from Intel?juanrga2015/10/01 03:59 AM
      5-6 wide core, why no mention from Intel?Wouter Tinus2015/10/01 02:48 PM
        5-6 wide core, why no mention from Intel?juanrga2015/10/03 03:17 AM
          5-6 wide core, why no mention from Intel?Wouter Tinus2015/10/03 11:19 AM
            Are you kidding? (NT)juanrga2015/10/04 05:30 AM
              Are you kidding?Wouter Tinus2015/10/04 03:18 PM
                Are you kidding?juanrga2015/10/05 09:46 AM
                  Are you kidding?David Kanter2015/10/05 11:24 AM
                    Are you kidding?anon2015/10/05 09:26 PM
                    Are you kidding?Linus Torvalds2015/10/07 04:49 AM
                      Are you kidding?juanrga2015/10/07 10:46 AM
                        Are you kidding?anon2015/10/07 06:21 PM
                  Are you kidding?Wouter Tinus2015/10/05 01:25 PM
                    Are you kidding?juanrga2015/10/06 10:17 AM
                      Are you kidding?Stubabe2015/10/07 12:17 AM
                        Are you kidding?juanrga2015/10/07 10:56 AM
                          Amazing...Wouter Tinus2015/10/07 11:31 AM
                            Amazing...juanrga2015/10/07 03:45 PM
                          Are you kidding?Stubabe2015/10/07 11:57 AM
                            Are you kidding?juanrga2015/10/07 03:59 PM
                          Are you kidding?Wilco2015/10/07 02:07 PM
                            Are you kidding?juanrga2015/10/07 04:33 PM
      5-6 wide core, why no mention from Intel?Eric Bron2015/10/04 04:18 AM
    5-6 wide core, why no mention from Intel?David Kanter2015/10/01 09:01 AM
      Optimal number and kind of execution unitsjuanrga2015/10/01 10:50 AM
        Optimal number and kind of execution unitsPatrick Chase2015/10/01 04:38 PM
          Optimal number and kind of execution unitsI.S.T.2015/10/01 05:10 PM
            Optimal number and kind of execution unitsPatrick Chase2015/10/01 11:39 PM
          Optimal number and kind of execution unitsExophase2015/10/01 10:11 PM
          Optimal number and kind of execution unitsjuanrga2015/10/02 05:14 AM
      LD/ST unitsSHK2015/10/01 11:11 AM
        LD/ST unitsDavid Kanter2015/10/01 12:54 PM
          LD/ST unitsSHK2015/10/02 04:55 AM
            LD/ST unitsJukka Larja2015/10/02 09:49 PM
        LD/ST unitsMaynard Handley2015/10/01 01:01 PM
          LD/ST unitsanon2015/10/01 09:54 PM
      5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 12:57 PM
        5-6 wide core, why no mention from Intel?David Kanter2015/10/01 03:49 PM
          5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 06:21 PM
          5-6 wide core, why no mention from Intel?Exophase2015/10/01 10:07 PM
            5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 12:10 AM
              5-6 wide core, why no mention from Intel?Megol2015/10/02 03:39 AM
                5-6 wide core, why no mention from Intel?Michael S2015/10/02 04:27 AM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 09:37 AM
                  5-6 wide core, why no mention from Intel?noko2015/10/02 05:19 PM
              5-6 wide core, why no mention from Intel?Exophase2015/10/02 06:43 AM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 09:45 AM
                  5-6 wide core, why no mention from Intel?Exophase2015/10/02 10:23 AM
          5-6 wide core, why no mention from Intel?Wilco2015/10/02 12:48 PM
            5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 01:25 PM
              5-6 wide core, why no mention from Intel?Wilco2015/10/02 02:26 PM
              5-6 wide core, why no mention from Intel?noko2015/10/02 05:45 PM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 06:54 PM
            5-6 wide core, why no mention from Intel?David Kanter2015/10/02 01:59 PM
              5-6 wide core, why no mention from Intel?Wilco2015/10/02 02:59 PM
                5-6 wide core, why no mention from Intel?David Kanter2015/10/02 03:15 PM
                  5-6 wide core, why no mention from Intel?Wilco2015/10/02 04:06 PM
                    LDP/STP usage in AArch64 for 403.gccnone2015/10/03 01:04 AM
                      LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 03:02 AM
                        LDP/STP usage in AArch64 for 403.gccnone2015/10/03 03:11 AM
                          LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 03:37 AM
                            LDP/STP usage in AArch64 for 403.gccnone2015/10/03 04:37 AM
                              LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 05:26 AM
                  5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 04:24 PM
              5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 03:07 PM
  Update to Intel Optimization Manualanon2015/09/30 04:43 PM
  Update to Intel Optimization ManualPatrick Chase2015/09/30 09:44 PM
    Update to Intel Optimization Manualanon2015/09/30 10:49 PM
    Update to Intel Optimization Manualnone2015/09/30 10:50 PM
    Update to Intel Optimization ManualDavid Kanter2015/10/01 12:52 PM
      Update to Intel Optimization ManualPatrick Chase2015/10/01 04:16 PM
        Update to Intel Optimization Manualanon2015/10/01 10:45 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?