Caching dependence info in µop cache

By: anon (anon.delete@this.ymous.org), October 16, 2020 12:36 pm
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on October 16, 2020 6:20 am wrote:
> anon
> (anon.delete@this.ymous.org) on October 15, 2020 11:56 am wrote:
> [snip]
>
> > Thank you for this detailed analysis. I haven't read the paper
> thoroughly yet, but I wanted
> > to discuss one of your comments about reordering uops and rename
> optimizations.
> >
> > Regarding reordering, the problem here is that is that you cannot
> generally rename out-of-order because
> > although this might not have any impact at first glance (What's the
> difference between "add rax, rcx;
> > ld rbx, [rdx], add r12, rax" and "ld rbx, [rdx], add rax, rcx; add
> r12, rax"?), I think it it gets messy
> > if you want precise exceptions/interruptions. So, you probably can
> rename out-of-order but you need
> > to map those out-of-order mappings back to a ROB-like structure
> that is allocated earlier than rename
> > in the pipeline and it might be weird. However, that is an
> interesting thought because I know of some
> > designs where the RAT is port-limited and so rename groups with at
> most x reads/writes have to be formed,
> > which may not match what is coming out of Decode (compiler could
> help though).
>
> Since exceptions are exceptional, one could do replay from the Icache (assuming data-inclusive Icache,
> which seems typical). This is vaguely similar to one of the POWER implementations (POWER5?) using
> replay with single-operation bundles on exceptions so that ROB overhead could be reduced. If the
> Icache was not data inclusive of the µop cache (as opposed to just tag inclusive to support snooping),
> the untangling of µop order would require extra information and be a bit complex.
>
> > On the rename optimization thing ("rewriting"). I am not
> > sure I followed the idea. Could you please elaborate?
>
> With respect to renaming, if a source operand is the destination of a previous µop, one can replace
> that register name with the number of the µop that provides the value (assuming single-result
> µops). When renaming, those sources would not read the RAT but the free list to get their new
> name. This is just caching the dependence information; detecting dependencies before RAT access
> would increase latency (yet allow reading from the free list and not the RAT), detecting them in
> parallel with RAT access would increase RAT port demand — caching the information provides the
> benefit of the former without the latency cost (and could save some energy as well).
>
> (One could also imagine other optimizations, some of which would depend on the design of
> the scheduler. For example, a scheduler might use indexed wake-up rather than broadcast
> comparison when one operation is known to provide the last-to-be-available source of another
> operation; if replay is cheap, predicted as last-to-be-available might suffice.)

Thank you, I think I got it. I will point out that it could be possible to implement this directly at the I-Cache by rewriting a custom format on a line fill if the instruction format permits it. If not, then doing it in the uop cache does seem like the place, even though in practice it becomes hard to tell if it will impact cycle time or not. Essentially, another level of choice is added at Rename : Who uses the free list and who uses the RAT (for sources). It might not be a big deal but it means that each "lane" from Decode has to enable a different circuit whereas in the baseline each "lane" would go straight to the RAT period (for sources).

Indexed wake up on the scheduler has been argued to be much more power efficient (e.g. TRIPS), but in this case, it would be added on top of the baseline broadcast wake up and to so it sounds like a hard sell to the circuit designer, especially for something that would only work if the source and the last destination are in the same dispatch bundle. Maybe an indexed write port used for dispatch could be stolen...
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Zen 3Blue2020/10/08 09:58 AM
  Zen 3Rayla2020/10/08 10:10 AM
  Zen 3Adrian2020/10/08 10:13 AM
    Does anyone know whether Zen 3 has AVX-512? (NT)Foo_2020/10/08 11:54 AM
      Does anyone know whether Zen 3 has AVX-512?Adrian2020/10/08 12:11 PM
  Zen 3 - Number of load/store units2020/10/08 10:21 AM
    Zen 3 - Number of load/store unitsRayla2020/10/08 10:28 AM
      Zen 3 - Number of load/store units2020/10/08 11:22 AM
        Zen 3 - Number of load/store unitsAdrian2020/10/08 11:53 AM
          Zen 3 - Number of load/store unitsTravis Downs2020/10/08 09:45 PM
          Zen 3 - CAD benchmarkPer Hesselgren2020/10/09 07:29 AM
            Zen 3 - CAD benchmarkAdrian2020/10/09 09:27 AM
        Zen 3 - Number of load/store unitsitsmydamnation2020/10/08 02:38 PM
          Zen 3 - Number of load/store unitsGroo2020/10/08 02:48 PM
            Zen 3 - Number of load/store unitsWilco2020/10/08 03:02 PM
              Zen 3 - Number of load/store unitsDummond D. Slow2020/10/08 04:39 PM
                Zen 3 - Number of load/store unitsDoug S2020/10/09 08:11 AM
                  Zen 3 - Number of load/store unitsDummond D. Slow2020/10/09 09:43 AM
                    Zen 3 - Number of load/store unitsDoug S2020/10/09 01:43 PM
                      N7 and N7P are not load/Store units - please fix the topic in your replies (NT)Heikki Kultala2020/10/10 07:37 AM
  Zen 3Jeff S.2020/10/08 12:16 PM
    Zen 3anon2020/10/08 01:57 PM
    Disappointing opening line in paperPaul A. Clayton2020/10/11 06:16 AM
      Thoughts on "Improving the Utilization of µop Caches..."Paul A. Clayton2020/10/14 12:11 PM
        Thoughts on "Improving the Utilization of µop Caches..."anon2020/10/15 11:56 AM
          Thoughts on "Improving the Utilization of µop Caches..."anon2020/10/15 11:57 AM
            Sorry about the messanon2020/10/15 11:58 AM
              Sorry about the messBrett2020/10/16 03:22 AM
          Caching dependence info in µop cachePaul A. Clayton2020/10/16 06:20 AM
            Caching dependence info in µop cacheanon2020/10/16 12:36 PM
              Caching dependence info in µop cachePaul A. Clayton2020/10/18 01:28 PM
  Zen 3juanrga2020/10/09 10:12 AM
  Zen 3Mr. Camel2020/10/09 06:30 PM
    Zen 3anon.12020/10/10 12:44 AM
      Cinebench is terrible benchmarkDavid Kanter2020/10/10 10:36 AM
        Cinebench is terrible benchmarkanon.12020/10/10 12:06 PM
        Cinebench is terrible benchmarkhobold2020/10/10 12:33 PM
          Some comments on benchmarksPaul A. Clayton2020/10/14 12:11 PM
            Some comments on benchmarksMark Roulo2020/10/14 03:21 PM
    Zen 3Adrian2020/10/10 01:59 AM
      Zen 3Adrian2020/10/10 02:18 AM
        Zen 3majord2020/10/15 04:02 AM
  Zen 3hobold2020/10/10 08:58 AM
    Zen 3Maynard Handley2020/10/10 10:36 AM
      Zen 3hobold2020/10/10 12:19 PM
        Zen 3anon2020/10/11 02:58 AM
          Zen 3hobold2020/10/11 12:32 PM
            Zen 3anon2020/10/11 01:07 PM
              Zen 3hobold2020/10/11 02:22 PM
    Zen 3anon2020/10/10 11:51 AM
    Zen 3Michael S2020/10/11 01:16 AM
      Zen 3hobold2020/10/11 02:13 AM
        Zen 3Michael S2020/10/11 02:18 AM
      Zen 3anon.12020/10/11 12:17 PM
  Zen 3David Hess2020/10/12 06:43 AM
    more power? (NT)anonymous22020/10/12 01:26 PM
      I think he's comparing 65W 3700X vs 105W 5800X (NT)John H2020/10/12 04:33 PM
        ?! Those are apples and oranges! (NT)anon2020/10/12 04:49 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊