Disappointing opening line in paper

By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), October 11, 2020 6:16 am
Room: Moderated Discussions
Jeff S. (fakity.delete@this.fake.com) on October 8, 2020 12:16 pm wrote:
[snip]
> A pre-print from MICRO '20 later this month:
> Improving the Utilization of Micro-operation Caches in x86 Processors
>
CLASP and Compaction (a) improve uop cache utilization/fetch ratio, dispatch bandwidth, average branch misprediction
> penalty and overall performance, and (b) reduce decoder power consumption. These optimizations combined improve
> performance by 5.3%, uop cache fetch ratio by 28.8% and dispatch bandwidth by 6.28%, while, reducing the decoder
> power consumption by 19.4% and branch misprediction latency by 5.23% in our workloads.


I have not finished reading that paper (motivation and mental acuity have not been sufficiently present at the same time), but the opening line appears counter-factual: "Most modern processors employ variable length, Complex Instruction Set Computing (CISC) instructions to reduce instruction fetch energy cost and bandwidth requirements." That most modern processors employ variable length, CISC instructions is not true (unless, perhaps, one counts Thumb2 as CISC, but later statements tend to exclude that).

On a lesser point, both x86 and zArchitecture use CISC for legacy reasons; code density was an original motivation (and for x86 single byte instructions would have been helpful for early 8-bit memory interfaces), but if software compatibility (and ISA walls — patents, institutional knowledge, etc.) was not important neither x86 nor zArchitecture would be continued. (Thumb2 is not exactly an excellent encoding for its modern uses.)

(I think Renesas RX is the only commercial modern CISC. While CISC was chosen for code density, that was mainly for static code storage size not fetch energy or bandwidth; for microcontrollers and some other embedded systems static code size is very important.)

The claim that variable length encoding is incompatible with low(ish) overhead decode such that µop caches are needed seems to be contradicted by Zarchitecture implementations lacking µop caches (as far as I recall). x86 is not just byte-granular with 15 different sizes but also has somewhat complex length determination; it is not a good example of variable length (or variable work) instruction encoding. (I think variable length µop formats are likely to be beneficial in terms of storage cost and access energy. I suspect a distinct immediate storage area might not be worthwhile unless some other use of the storage provided higher/more balanced utilization — immediates sharing a µop cache line with base µops probably tends to balance utilization fairly well. The design space for µop caches seems large (and interesting).)

The storing of immediates in a heads-and-tails-like (Heidi Pan, "High Performance, Variable-Length Instruction Encodings", 2002 Master's thesis) manner was interesting — perhaps the 56-bit µop format excludes larger immediates. My initial thought was whether they compared to a grow toward the middle shared µop cache line, but having immediates and base µops grow toward the middle would make such impractical (I think).

Thank you for sharing the paper. I do hope to finish it soon, but I want to be able to give it proper attention.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Zen 3Blue2020/10/08 09:58 AM
  Zen 3Rayla2020/10/08 10:10 AM
  Zen 3Adrian2020/10/08 10:13 AM
    Does anyone know whether Zen 3 has AVX-512? (NT)Foo_2020/10/08 11:54 AM
      Does anyone know whether Zen 3 has AVX-512?Adrian2020/10/08 12:11 PM
  Zen 3 - Number of load/store units2020/10/08 10:21 AM
    Zen 3 - Number of load/store unitsRayla2020/10/08 10:28 AM
      Zen 3 - Number of load/store units2020/10/08 11:22 AM
        Zen 3 - Number of load/store unitsAdrian2020/10/08 11:53 AM
          Zen 3 - Number of load/store unitsTravis Downs2020/10/08 09:45 PM
          Zen 3 - CAD benchmarkPer Hesselgren2020/10/09 07:29 AM
            Zen 3 - CAD benchmarkAdrian2020/10/09 09:27 AM
        Zen 3 - Number of load/store unitsitsmydamnation2020/10/08 02:38 PM
          Zen 3 - Number of load/store unitsGroo2020/10/08 02:48 PM
            Zen 3 - Number of load/store unitsWilco2020/10/08 03:02 PM
              Zen 3 - Number of load/store unitsDummond D. Slow2020/10/08 04:39 PM
                Zen 3 - Number of load/store unitsDoug S2020/10/09 08:11 AM
                  Zen 3 - Number of load/store unitsDummond D. Slow2020/10/09 09:43 AM
                    Zen 3 - Number of load/store unitsDoug S2020/10/09 01:43 PM
                      N7 and N7P are not load/Store units - please fix the topic in your replies (NT)Heikki Kultala2020/10/10 07:37 AM
  Zen 3Jeff S.2020/10/08 12:16 PM
    Zen 3anon2020/10/08 01:57 PM
    Disappointing opening line in paperPaul A. Clayton2020/10/11 06:16 AM
      Thoughts on "Improving the Utilization of µop Caches..."Paul A. Clayton2020/10/14 12:11 PM
        Thoughts on "Improving the Utilization of µop Caches..."anon2020/10/15 11:56 AM
          Thoughts on "Improving the Utilization of µop Caches..."anon2020/10/15 11:57 AM
            Sorry about the messanon2020/10/15 11:58 AM
              Sorry about the messBrett2020/10/16 03:22 AM
          Caching dependence info in µop cachePaul A. Clayton2020/10/16 06:20 AM
            Caching dependence info in µop cacheanon2020/10/16 12:36 PM
              Caching dependence info in µop cachePaul A. Clayton2020/10/18 01:28 PM
  Zen 3juanrga2020/10/09 10:12 AM
  Zen 3Mr. Camel2020/10/09 06:30 PM
    Zen 3anon.12020/10/10 12:44 AM
      Cinebench is terrible benchmarkDavid Kanter2020/10/10 10:36 AM
        Cinebench is terrible benchmarkanon.12020/10/10 12:06 PM
        Cinebench is terrible benchmarkhobold2020/10/10 12:33 PM
          Some comments on benchmarksPaul A. Clayton2020/10/14 12:11 PM
            Some comments on benchmarksMark Roulo2020/10/14 03:21 PM
    Zen 3Adrian2020/10/10 01:59 AM
      Zen 3Adrian2020/10/10 02:18 AM
        Zen 3majord2020/10/15 04:02 AM
  Zen 3hobold2020/10/10 08:58 AM
    Zen 3Maynard Handley2020/10/10 10:36 AM
      Zen 3hobold2020/10/10 12:19 PM
        Zen 3anon2020/10/11 02:58 AM
          Zen 3hobold2020/10/11 12:32 PM
            Zen 3anon2020/10/11 01:07 PM
              Zen 3hobold2020/10/11 02:22 PM
    Zen 3anon2020/10/10 11:51 AM
    Zen 3Michael S2020/10/11 01:16 AM
      Zen 3hobold2020/10/11 02:13 AM
        Zen 3Michael S2020/10/11 02:18 AM
      Zen 3anon.12020/10/11 12:17 PM
  Zen 3David Hess2020/10/12 06:43 AM
    more power? (NT)anonymous22020/10/12 01:26 PM
      I think he's comparing 65W 3700X vs 105W 5800X (NT)John H2020/10/12 04:33 PM
        ?! Those are apples and oranges! (NT)anon2020/10/12 04:49 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊