Caching dependence info in µop cache

By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), October 16, 2020 6:20 am
Room: Moderated Discussions
anon
(anon.delete@this.ymous.org) on October 15, 2020 11:56 am wrote:
[snip]

> Thank you for this detailed analysis. I haven't read the paper
thoroughly yet, but I wanted
> to discuss one of your comments about reordering uops and rename
optimizations.
>
> Regarding reordering, the problem here is that is that you cannot
generally rename out-of-order because
> although this might not have any impact at first glance (What's the
difference between "add rax, rcx;
> ld rbx, [rdx], add r12, rax" and "ld rbx, [rdx], add rax, rcx; add
r12, rax"?), I think it it gets messy
> if you want precise exceptions/interruptions. So, you probably can
rename out-of-order but you need
> to map those out-of-order mappings back to a ROB-like structure
that is allocated earlier than rename
> in the pipeline and it might be weird. However, that is an
interesting thought because I know of some
> designs where the RAT is port-limited and so rename groups with at
most x reads/writes have to be formed,
> which may not match what is coming out of Decode (compiler could
help though).

Since exceptions are exceptional, one could do replay from the Icache (assuming data-inclusive Icache, which seems typical). This is vaguely similar to one of the POWER implementations (POWER5?) using replay with single-operation bundles on exceptions so that ROB overhead could be reduced. If the Icache was not data inclusive of the µop cache (as opposed to just tag inclusive to support snooping), the untangling of µop order would require extra information and be a bit complex.

> On the rename optimization thing ("rewriting"). I am not
> sure I followed the idea. Could you please elaborate?

With respect to renaming, if a source operand is the destination of a previous µop, one can replace that register name with the number of the µop that provides the value (assuming single-result µops). When renaming, those sources would not read the RAT but the free list to get their new name. This is just caching the dependence information; detecting dependencies before RAT access would increase latency (yet allow reading from the free list and not the RAT), detecting them in parallel with RAT access would increase RAT port demand — caching the information provides the benefit of the former without the latency cost (and could save some energy as well).

(One could also imagine other optimizations, some of which would depend on the design of the scheduler. For example, a scheduler might use indexed wake-up rather than broadcast comparison when one operation is known to provide the last-to-be-available source of another operation; if replay is cheap, predicted as last-to-be-available might suffice.)
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Zen 3Blue2020/10/08 09:58 AM
  Zen 3Rayla2020/10/08 10:10 AM
  Zen 3Adrian2020/10/08 10:13 AM
    Does anyone know whether Zen 3 has AVX-512? (NT)Foo_2020/10/08 11:54 AM
      Does anyone know whether Zen 3 has AVX-512?Adrian2020/10/08 12:11 PM
  Zen 3 - Number of load/store units2020/10/08 10:21 AM
    Zen 3 - Number of load/store unitsRayla2020/10/08 10:28 AM
      Zen 3 - Number of load/store units2020/10/08 11:22 AM
        Zen 3 - Number of load/store unitsAdrian2020/10/08 11:53 AM
          Zen 3 - Number of load/store unitsTravis Downs2020/10/08 09:45 PM
          Zen 3 - CAD benchmarkPer Hesselgren2020/10/09 07:29 AM
            Zen 3 - CAD benchmarkAdrian2020/10/09 09:27 AM
        Zen 3 - Number of load/store unitsitsmydamnation2020/10/08 02:38 PM
          Zen 3 - Number of load/store unitsGroo2020/10/08 02:48 PM
            Zen 3 - Number of load/store unitsWilco2020/10/08 03:02 PM
              Zen 3 - Number of load/store unitsDummond D. Slow2020/10/08 04:39 PM
                Zen 3 - Number of load/store unitsDoug S2020/10/09 08:11 AM
                  Zen 3 - Number of load/store unitsDummond D. Slow2020/10/09 09:43 AM
                    Zen 3 - Number of load/store unitsDoug S2020/10/09 01:43 PM
                      N7 and N7P are not load/Store units - please fix the topic in your replies (NT)Heikki Kultala2020/10/10 07:37 AM
  Zen 3Jeff S.2020/10/08 12:16 PM
    Zen 3anon2020/10/08 01:57 PM
    Disappointing opening line in paperPaul A. Clayton2020/10/11 06:16 AM
      Thoughts on "Improving the Utilization of µop Caches..."Paul A. Clayton2020/10/14 12:11 PM
        Thoughts on "Improving the Utilization of µop Caches..."anon2020/10/15 11:56 AM
          Thoughts on "Improving the Utilization of µop Caches..."anon2020/10/15 11:57 AM
            Sorry about the messanon2020/10/15 11:58 AM
              Sorry about the messBrett2020/10/16 03:22 AM
          Caching dependence info in µop cachePaul A. Clayton2020/10/16 06:20 AM
            Caching dependence info in µop cacheanon2020/10/16 12:36 PM
              Caching dependence info in µop cachePaul A. Clayton2020/10/18 01:28 PM
  Zen 3juanrga2020/10/09 10:12 AM
  Zen 3Mr. Camel2020/10/09 06:30 PM
    Zen 3anon.12020/10/10 12:44 AM
      Cinebench is terrible benchmarkDavid Kanter2020/10/10 10:36 AM
        Cinebench is terrible benchmarkanon.12020/10/10 12:06 PM
        Cinebench is terrible benchmarkhobold2020/10/10 12:33 PM
          Some comments on benchmarksPaul A. Clayton2020/10/14 12:11 PM
            Some comments on benchmarksMark Roulo2020/10/14 03:21 PM
    Zen 3Adrian2020/10/10 01:59 AM
      Zen 3Adrian2020/10/10 02:18 AM
        Zen 3majord2020/10/15 04:02 AM
  Zen 3hobold2020/10/10 08:58 AM
    Zen 3Maynard Handley2020/10/10 10:36 AM
      Zen 3hobold2020/10/10 12:19 PM
        Zen 3anon2020/10/11 02:58 AM
          Zen 3hobold2020/10/11 12:32 PM
            Zen 3anon2020/10/11 01:07 PM
              Zen 3hobold2020/10/11 02:22 PM
    Zen 3anon2020/10/10 11:51 AM
    Zen 3Michael S2020/10/11 01:16 AM
      Zen 3hobold2020/10/11 02:13 AM
        Zen 3Michael S2020/10/11 02:18 AM
      Zen 3anon.12020/10/11 12:17 PM
  Zen 3David Hess2020/10/12 06:43 AM
    more power? (NT)anonymous22020/10/12 01:26 PM
      I think he's comparing 65W 3700X vs 105W 5800X (NT)John H2020/10/12 04:33 PM
        ?! Those are apples and oranges! (NT)anon2020/10/12 04:49 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊