LD/ST units

By: anon (anon.delete@this.anon.com), October 1, 2015 9:54 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on October 1, 2015 2:01 pm wrote:
> SHK (no.delete@this.mail.com) on October 1, 2015 12:11 pm wrote:
> > David Kanter (dkanter.delete@this.realworldtech.com) on October 1, 2015 10:01 am wrote:
> > > Wouter Tinus (wouter.tinus.delete@this.gmail.com) on September 30, 2015 3:14 pm wrote:
> > > > It seems easy to argue that Skylake is a 5-wide or even 6-wide machine.
> > > >
> > > > - 5 wide decode
> > > > - 6 wide allocation/decoder queue
> > > > - 6 wide ROB
> > > > - 8 wide issue
> > > > - 8 wide retire (4/thread)
> > > >
> > > > Though Haswell already added extra two extra issue ports, this the first real increase in width
> > > > since the introduction of Merom back in 2006. Yet they didn't even bother to mention it at IDF :(
> > >
> > > Actually, I think Sandy Bridge and Haswell were more significant.
> > >
> > > It's nice to have more ALUs, but what really matters are the load/store units. Having 10 ALUs with 1 LD/ST
> > > unit is really pointless, except on code with insanely high compute:memory ratios (which isn't most code).
> > >
> > > For a general purpose CPU, I'd focus on getting the load/store right first, then focus on the ALUs.
> > >
> > > David
> >
> > Agreed, i hope that in the skylake-xeon the L1 latency will be lower. At last since snb where're stuck
> > with the same size/latency/associtivity, throughput is nice, but IMHO the real salient is latency.
> > Power8 has twice the L1D, same associtivity and 3-cycles latency, and i think it's
> > worth to pay the price in Watt for that (for desktop and servers, of course).
>
> Doesn't POWER8 use way prediction, and have to replay the load if the way prediction misses (and the repeatedly
> replay if L1 misses)?

POWER7 uses way prediction and most likely speculative issue of dependent instruction to achieve a 2 cycle load to use latency. I haven't found much information about how a way miss is handled or even how much it costs.

In POWER4, when a load misses L1, then dependent instruction stays in the issue queue and gets reissued after the data is available. I would assume POWER7 does the same thing, and does the same for a way miss. I don't know whether any chain of dependent instructions have to be thrown out and then replayed (except one that gets reissued), and I strongly doubt there is a repeated replay (POWER4 indicates the instruction is held, and POWER7 has only become more efficiency critical).

I have not found a mention of way prediction in POWER8. But considering the cache is 2x larger, I would be surprised if they removed it and only added one cycle of latency.

> That's a lot of power-expensive machinery, and Intel is (thanks to the way they've
> chosen to segment the market) not really in a position to put this in Xeon (where it might make sense)
> without having to also accept it in Core-m where it's a rather more problematic tradeoff.

POWER is efficiency constrained like everybody else, except with the luxury of having CPUs with highly parallel workloads in mind. If a feature does not improve perf/watt, then spending the power on more cores is an option. Of course this must take into account single threaded performance improvements increase parallel perf/watt effectively via Amdhal's law etc.

IBM found a different tradeoff than Intel, but it doesn't always follow that a high performance or power hungry feature comes from a higher core power budget. Other elements of Intel's cores are more powerful than POWER.

>
> I have to wonder if you could get much the same level of performance boost easier
> and at lower power through value prediction; but I honestly don't know.

I would be surprised if Intel did not do any form of L1 way prediction to save power, it just seems that they do not use it to reduce latency.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Update to Intel Optimization ManualSHK2015/09/29 05:38 AM
  gather speedEric Bron2015/09/29 09:43 AM
    gather speedGabriele Svelto2015/09/29 12:00 PM
  Update to Intel Optimization ManualTim McCaffrey2015/09/29 11:18 AM
    Update to Intel Optimization ManualSHK2015/09/29 12:04 PM
      Update to Intel Optimization ManualAnon2015/09/29 02:23 PM
    Update to Intel Optimization Manualnone2015/09/29 10:31 PM
      Update to Intel Optimization ManualMichael S2015/09/30 04:24 AM
    Update to Intel Optimization ManualMichael S2015/09/30 04:30 AM
      Update to Intel Optimization ManualTim McCaffrey2015/09/30 10:01 AM
  5-6 wide core, why no mention from Intel?Wouter Tinus2015/09/30 02:14 PM
    5-6 wide core, why no mention from Intel?Maynard Handley2015/09/30 03:30 PM
      5-6 wide core, why no mention from Intel?Alberto2015/10/01 12:13 AM
        5-6 wide core, why no mention from Intel?anon2015/10/01 02:21 AM
          5-6 wide core, why no mention from Intel?Alberto2015/10/01 04:41 AM
            5-6 wide core, why no mention from Intel?anon2015/10/01 05:27 AM
              5-6 wide core, why no mention from Intel?Alberto2015/10/01 08:33 AM
                5-6 wide core, why no mention from Intel?juanrga2015/10/01 10:24 AM
        5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 08:57 AM
    5-6 wide core, why no mention from Intel?juanrga2015/10/01 03:59 AM
      5-6 wide core, why no mention from Intel?Wouter Tinus2015/10/01 02:48 PM
        5-6 wide core, why no mention from Intel?juanrga2015/10/03 03:17 AM
          5-6 wide core, why no mention from Intel?Wouter Tinus2015/10/03 11:19 AM
            Are you kidding? (NT)juanrga2015/10/04 05:30 AM
              Are you kidding?Wouter Tinus2015/10/04 03:18 PM
                Are you kidding?juanrga2015/10/05 09:46 AM
                  Are you kidding?David Kanter2015/10/05 11:24 AM
                    Are you kidding?anon2015/10/05 09:26 PM
                    Are you kidding?Linus Torvalds2015/10/07 04:49 AM
                      Are you kidding?juanrga2015/10/07 10:46 AM
                        Are you kidding?anon2015/10/07 06:21 PM
                  Are you kidding?Wouter Tinus2015/10/05 01:25 PM
                    Are you kidding?juanrga2015/10/06 10:17 AM
                      Are you kidding?Stubabe2015/10/07 12:17 AM
                        Are you kidding?juanrga2015/10/07 10:56 AM
                          Amazing...Wouter Tinus2015/10/07 11:31 AM
                            Amazing...juanrga2015/10/07 03:45 PM
                          Are you kidding?Stubabe2015/10/07 11:57 AM
                            Are you kidding?juanrga2015/10/07 03:59 PM
                          Are you kidding?Wilco2015/10/07 02:07 PM
                            Are you kidding?juanrga2015/10/07 04:33 PM
      5-6 wide core, why no mention from Intel?Eric Bron2015/10/04 04:18 AM
    5-6 wide core, why no mention from Intel?David Kanter2015/10/01 09:01 AM
      Optimal number and kind of execution unitsjuanrga2015/10/01 10:50 AM
        Optimal number and kind of execution unitsPatrick Chase2015/10/01 04:38 PM
          Optimal number and kind of execution unitsI.S.T.2015/10/01 05:10 PM
            Optimal number and kind of execution unitsPatrick Chase2015/10/01 11:39 PM
          Optimal number and kind of execution unitsExophase2015/10/01 10:11 PM
          Optimal number and kind of execution unitsjuanrga2015/10/02 05:14 AM
      LD/ST unitsSHK2015/10/01 11:11 AM
        LD/ST unitsDavid Kanter2015/10/01 12:54 PM
          LD/ST unitsSHK2015/10/02 04:55 AM
            LD/ST unitsJukka Larja2015/10/02 09:49 PM
        LD/ST unitsMaynard Handley2015/10/01 01:01 PM
          LD/ST unitsanon2015/10/01 09:54 PM
      5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 12:57 PM
        5-6 wide core, why no mention from Intel?David Kanter2015/10/01 03:49 PM
          5-6 wide core, why no mention from Intel?Maynard Handley2015/10/01 06:21 PM
          5-6 wide core, why no mention from Intel?Exophase2015/10/01 10:07 PM
            5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 12:10 AM
              5-6 wide core, why no mention from Intel?Megol2015/10/02 03:39 AM
                5-6 wide core, why no mention from Intel?Michael S2015/10/02 04:27 AM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 09:37 AM
                  5-6 wide core, why no mention from Intel?noko2015/10/02 05:19 PM
              5-6 wide core, why no mention from Intel?Exophase2015/10/02 06:43 AM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 09:45 AM
                  5-6 wide core, why no mention from Intel?Exophase2015/10/02 10:23 AM
          5-6 wide core, why no mention from Intel?Wilco2015/10/02 12:48 PM
            5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 01:25 PM
              5-6 wide core, why no mention from Intel?Wilco2015/10/02 02:26 PM
              5-6 wide core, why no mention from Intel?noko2015/10/02 05:45 PM
                5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 06:54 PM
            5-6 wide core, why no mention from Intel?David Kanter2015/10/02 01:59 PM
              5-6 wide core, why no mention from Intel?Wilco2015/10/02 02:59 PM
                5-6 wide core, why no mention from Intel?David Kanter2015/10/02 03:15 PM
                  5-6 wide core, why no mention from Intel?Wilco2015/10/02 04:06 PM
                    LDP/STP usage in AArch64 for 403.gccnone2015/10/03 01:04 AM
                      LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 03:02 AM
                        LDP/STP usage in AArch64 for 403.gccnone2015/10/03 03:11 AM
                          LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 03:37 AM
                            LDP/STP usage in AArch64 for 403.gccnone2015/10/03 04:37 AM
                              LDP/STP usage in AArch64 for 403.gccWilco2015/10/03 05:26 AM
                  5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 04:24 PM
              5-6 wide core, why no mention from Intel?Maynard Handley2015/10/02 03:07 PM
  Update to Intel Optimization Manualanon2015/09/30 04:43 PM
  Update to Intel Optimization ManualPatrick Chase2015/09/30 09:44 PM
    Update to Intel Optimization Manualanon2015/09/30 10:49 PM
    Update to Intel Optimization Manualnone2015/09/30 10:50 PM
    Update to Intel Optimization ManualDavid Kanter2015/10/01 12:52 PM
      Update to Intel Optimization ManualPatrick Chase2015/10/01 04:16 PM
        Update to Intel Optimization Manualanon2015/10/01 10:45 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?