You can do two 4-cycle loads per cycle

By: Wilco (Wilco.dijkstra.delete@this.ntlworld.com), September 20, 2018 2:32 am
Room: Moderated Discussions
anon (spam.delete.delete@this.this.spam.com) on September 20, 2018 1:34 am wrote:
> Travis Downs (travis.downs.delete@this.gmail.com) on September 19, 2018 5:30 pm wrote:
> > anon (spam.delete.delete@this.this.spam.com) on September 19, 2018 2:45 am wrote:
> > > > >
> > > > > Then how do you explain the restriction? What prevents the use of the
> > > > > fast path with registers that weren't the result of an earlier load?
> > > >
> > > > Hardware doesn't move between pipelines. If we assume 4-cycle loads skip the initial complex address
> > > > calculation stage (and not a later stage), a 4-cycle load after a 5-cycle load must wait for a cycle
> > > > simply because the pipeline stages it needs are still being used by the earlier load.
> > > >
> > > > Wilco
> > > >
> > >
> > > Am I missing something obvious here?
> > > 4 cycle loads exist.
> > > What is the restriction that prevents them when the adress is the result of an ALU op instead of a load?
> >
> > The way I understood it is that if you mix 4 and 5 cycle loads, for example, in a "throughput" scenario,
> > your 4 cycle loads will often end up taking 5 cycles because
> > the are out of alignment with the 5 cycle loads
> > and use the same pipeline stages. In the example, the 4 cycle load can't start in the cycle after a 5 cycle
> > load because it wants the second part of the load pipeline which is what the 5 cycle load is using.
> >
> > So it turns into a 5 cycle load. It maybe gets even messier
> > if the skipped pipeline stages are somewhere in the middle.
> >
> > We do know that 4 cycle loads do play nice in a throughput scenario if there are only
> > 4 cycle loads around since 8 concurrent 4-cycle pointer chases do execute at 2 loads
> > per cycle. Maybe I could add some 5 cycle loads in there and see what happens.
>
> Yeah but even in the nice throughput scenario 4 cycle loads didn't happen with the adress
> coming from an ALU, right? So the different latency doesn't seem to be the problem.

The 4-cycle path might only work within the load/store unit because of timing. Forwarding within a unit is faster than from a different unit.

However it's not clear this is the case, you need sequences like alu->load5->load->alu to check whether the load->alu latency can ever be 4 cycles.

Wilco
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
4-cycle L1 latency on Intel not as general as thoughTravis Downs2018/09/17 04:32 PM
  4-cycle L1 latency on Intel not as general as thoughanon2018/09/18 02:43 AM
    4-cycle L1 latency on Intel not as general as thoughtTravis Downs2018/09/18 09:39 AM
      4-cycle L1 latency on Intel not as general as thoughtanon2018/09/18 10:53 AM
        4-cycle L1 latency on Intel not as general as thoughtTravis Downs2018/09/18 11:07 AM
          4-cycle L1 latency on Intel not as general as thoughtanon2018/09/18 11:51 AM
            4-cycle L1 latency on Intel not as general as thoughtTravis Downs2018/09/18 01:52 PM
              4-cycle L1 latency on Intel not as general as thoughtanon2018/09/19 02:40 AM
                4-cycle L1 latency on Intel not as general as thoughtTravis Downs2018/09/19 05:20 PM
                  4-cycle L1 latency on Intel not as general as thoughtSeni2018/09/19 10:28 PM
                    4-cycle L1 latency on Intel not as general as thoughtGabriele Svelto2018/09/20 05:16 AM
                      4-cycle L1 latency on Intel not as general as thoughtTravis Downs2018/09/20 02:25 PM
                        4-cycle L1 latency on Intel not as general as thoughtGabriele Svelto2018/09/21 02:46 AM
                  4-cycle L1 latency on Intel not as general as thoughtanon2018/09/20 08:40 AM
                    4-cycle L1 latency on Intel not as general as thoughtTravis Downs2018/09/20 03:01 PM
    You can do two 4-cycle loads per cycleTravis Downs2018/09/18 10:58 AM
      You can do two 4-cycle loads per cycleanon2018/09/18 11:53 AM
        You can do two 4-cycle loads per cycleTravis Downs2018/09/18 12:29 PM
          You can do two 4-cycle loads per cycleanon2018/09/18 01:27 PM
            You can do two 4-cycle loads per cycleWilco2018/09/18 02:37 PM
              You can do two 4-cycle loads per cycleanon2018/09/19 02:45 AM
                You can do two 4-cycle loads per cycleTravis Downs2018/09/19 05:30 PM
                  You can do two 4-cycle loads per cycleanon2018/09/20 01:34 AM
                    You can do two 4-cycle loads per cycleWilco2018/09/20 02:32 AM
                      You can do two 4-cycle loads per cycleanon2018/09/20 04:35 AM
                      You can do two 4-cycle loads per cycleTravis Downs2018/09/20 03:33 PM
                    You can do two 4-cycle loads per cycleTravis Downs2018/09/20 03:10 PM
            You can do two 4-cycle loads per cycleTravis Downs2018/09/18 03:08 PM
              You can do two 4-cycle loads per cycleGabriele Svelto2018/09/19 01:39 AM
                You can do two 4-cycle loads per cycleTravis Downs2018/09/19 05:43 PM
              You can do two 4-cycle loads per cycleanon2018/09/19 02:42 AM
                You can do two 4-cycle loads per cycleTravis Downs2018/09/19 06:09 PM
                  You can do two 4-cycle loads per cycleanon2018/09/20 01:49 AM
                    You can do two 4-cycle loads per cycleTravis Downs2018/09/20 04:38 PM
                    You can do two 4-cycle loads per cycleTravis Downs2018/09/20 07:27 PM
                      You can do two 4-cycle loads per cycleanon2018/09/21 08:08 AM
            Separate RS for ALU vs load/storeTravis Downs2018/12/13 12:55 PM
              Separate RS for ALU vs load/storeanon2018/12/13 02:14 PM
              Separate RS for ALU vs load/storeanon.12018/12/13 09:15 PM
                Separate RS for ALU vs load/storeWilco2018/12/14 04:41 AM
                  Separate RS for ALU vs load/storeanon.12018/12/14 08:08 AM
                    Separate RS for ALU vs load/storeWilco2018/12/14 01:51 PM
              Integer divide also var latencyDavid Kanter2018/12/14 11:45 AM
                Integer divide also var latencyTravis Downs2018/12/14 09:09 PM
              Separate RS for ALU vs load/storeanon22018/12/14 09:57 PM
                Separate RS for ALU vs load/storeTravis Downs2018/12/15 11:00 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊