Large ARFs increase renaming cost

Article: ARM Goes 64-bit
By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), November 17, 2012 9:23 pm
Room: Moderated Discussions
name99 (name99.delete@this.redheron.com) on November 17, 2012 6:37 pm wrote:
[snip]
> Let's answer slightly differently.
> If you have a modern type of CPU (OoO, superscalar, prediction, all that good stuff) then there is a distinction
> between the number of PHYSICAL registers and the number of ARCHITECTED (ie expressible in assembly language)
> registers. The number of physical registers is going to be determined by how aggressively OoO you want your CPU
> to be, and it is the number of physical registers that determines power, area, cycle times and so on.
>
> Given this, the number of architected registers is essentially free. Going from 16 to 32 costs you
> an extra bit in each register specification (so 3 bits in most instructions) and that's it. If there
> is any advantage to increasing the number of architected registers, you might as well do so.

In an OoO implementation, one will also have a renaming table in the front end (and often one for commit as well, but that table has lesser requirements). Increasing the size of this table will increase power use and may complicate pipelining.

In addition, with a large number of architectural registers (with an implicit flat organization) many registers may be relatively idle but consume the same static power and increase the area of the register file (and so the latency and power of all register accesses).

Scalability of an ISA is also important. A large (flat) architectural register file will be less friendly to smaller implementations. (Such will also be less friendly to multithreading.) A 64 GPR ISA would require 33% more physical registers for GPRs than a 32 GPR ISA with 64 rename registers (which could satisfy an 80 instruction window in many cases). (There are also techniques to reduce the number of physical rename register required which do not apply as much or at all to architectural registers, e.g., virtual physical registers.)

> A second way to think about this is in terms of this as a "power" feature, like vector instructions, that will
> be used by people who know what they are doing, and not otherwise. For example: people who know what they are
> doing, on entry to a function, IMMEDIATELY load into a local (ie register) variable all globals that will be
> accessed, along with everything that will be used through a pointer. They do all their calculation in the local
> variables, then store everything on exit from the function. This style of coding requires a lot more registers
> to work with, but is also faster. (It's faster because it gets load latency off the critical path, and because
> it doesn't waste cycles doing things the compiler thinks might be necessary --- writing back globals, writing
> back instance variables --- but which you know are not, every time they are changed.)

In general high level language code will not express register allocation. With the increased interest in interprocedural and whole-program optimization, compilers should be fairly well equipped to exploit more registers.

It should also be noted that loading all registers at the start of a function may well not be an optimization. Even with an OoO implementation, distributing memory accesses through code tends to be more execution friendly.

> Why isn't the code you profiled showing this sort of thing? The cruel answer would be that there are just
> not that many people in the world who know what they are doing. A second alternative, which might be partially
> true, is that the bulk of programmers, and the bulk of code written, grew up with IA-32, not even x64,
> where this type of programming is not much of a win because of the paucity of registers. Ideally this will
> change, but we all know that it takes a generation or more for certain habits to die.

Or the bulk of code is written in a C-like language (with aliasing issues) and written for the ease of programming (at minimum writing, more ideally for ease of maintenance).

> Which gets back to my point. Plough-the-fields code will tend not to use half those registers, just
> like it doesn't use multi-threading or NEON. But performance critical code WILL use these features.

Compilers are becoming more sophisticated in autovectorization and register allocation may be an easier and more mature optimization. Autoparallelization is also becoming more common for certain types of programs.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: ARM Goes 64-bitDavid Kanter08/14/12 12:04 AM
  New Article: ARM Goes 64-bitnone08/14/12 12:44 AM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 01:04 AM
    MIPS MT-ASEPaul A. Clayton08/14/12 09:01 AM
      MONITOR/MWAITEduardoS08/14/12 10:08 AM
        MWAIT not specifically MTPaul A. Clayton08/14/12 10:36 AM
          MWAIT not specifically MTEduardoS08/15/12 03:16 PM
        MONITOR/MWAITanonymou508/14/12 11:07 AM
          MONITOR/MWAITEduardoS08/15/12 03:20 PM
      MIPS MT-ASErwessel08/14/12 10:14 AM
  New Article: ARM Goes 64-bitSHK08/14/12 02:01 AM
  New Article: ARM Goes 64-bitanon08/14/12 02:37 AM
    New Article: ARM Goes 64-bitRichard Cownie08/14/12 03:57 AM
      New Article: ARM Goes 64-bitanon08/14/12 04:29 AM
      New Article: ARM Goes 64-bitnone08/14/12 04:44 AM
        New Article: ARM Goes 64-bitanon08/14/12 05:28 AM
          New Article: ARM Goes 64-bitanon08/14/12 05:32 AM
            New Article: ARM Goes 64-bitEduardoS08/14/12 06:06 AM
          New Article: ARM Goes 64-bitnone08/14/12 05:40 AM
            AArch64 select better than cmovPaul A. Clayton08/14/12 06:08 AM
            New Article: ARM Goes 64-bitanon08/14/12 06:12 AM
              New Article: ARM Goes 64-bitnone08/14/12 06:25 AM
                Predicated ld/store are usefulPaul A. Clayton08/14/12 06:48 AM
                  Predicated ld/store are usefulnone08/14/12 06:56 AM
                    Predicated ld/store are usefulanon08/14/12 07:07 AM
                    Predicated stores might not be that badPaul A. Clayton08/14/12 07:27 AM
                      Predicated stores might not be that badDavid Kanter08/15/12 01:14 AM
                        Predicated stores might not be that badMichael S08/15/12 11:41 AM
                        Predicated stores might not be that badR Byron08/17/12 04:09 AM
                New Article: ARM Goes 64-bitanon08/14/12 06:54 AM
                  New Article: ARM Goes 64-bitnone08/14/12 07:04 AM
                    New Article: ARM Goes 64-bitanon08/14/12 07:43 AM
          New Article: ARM Goes 64-bitEduardoS08/14/12 06:07 AM
            New Article: ARM Goes 64-bitanon08/14/12 06:20 AM
              New Article: ARM Goes 64-bitnone08/14/12 06:29 AM
                New Article: ARM Goes 64-bitanon08/14/12 07:00 AM
            New Article: ARM Goes 64-bitMichael S08/14/12 03:43 PM
        New Article: ARM Goes 64-bitRichard Cownie08/14/12 06:53 AM
          OT: Conrad's "Youth"Richard Cownie08/14/12 07:20 AM
      New Article: ARM Goes 64-bitEduardoS08/14/12 06:04 AM
        New Article: ARM Goes 64-bitmpx08/14/12 08:59 AM
          New Article: ARM Goes 64-bitAntti-Ville Tuunainen08/14/12 09:16 AM
        New Article: ARM Goes 64-bitanonymou508/14/12 11:03 AM
          New Article: ARM Goes 64-bitname9911/17/12 03:31 PM
            Microarchitecting a counter registerPaul A. Clayton11/17/12 07:37 PM
    New Article: ARM Goes 64-bitbakaneko08/14/12 04:21 AM
      New Article: ARM Goes 64-bitname9911/17/12 03:40 PM
        New Article: ARM Goes 64-bitEduardoS11/17/12 04:52 PM
        New Article: ARM Goes 64-bitDoug S11/17/12 05:48 PM
        New Article: ARM Goes 64-bitbakaneko11/18/12 05:40 PM
          New Article: ARM Goes 64-bitWilco11/19/12 07:59 AM
            New Article: ARM Goes 64-bitEduardoS11/19/12 08:23 AM
              New Article: ARM Goes 64-bitWilco11/19/12 09:31 AM
                Downloading µarch-specific binaries?Paul A. Clayton11/19/12 11:21 AM
                New Article: ARM Goes 64-bitEduardoS11/19/12 11:41 AM
                  New Article: ARM Goes 64-bitWilco11/21/12 07:44 AM
                    JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)VMguy11/22/12 03:21 AM
                      JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)David Kanter11/22/12 12:12 PM
                        JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)Gabriele Svelto11/23/12 03:50 AM
                    New Article: ARM Goes 64-bitEduardoS11/23/12 10:09 AM
                      New Article: ARM Goes 64-bitEBFE11/26/12 01:24 AM
                        New Article: ARM Goes 64-bitGabriele Svelto11/26/12 03:33 AM
                          New Article: ARM Goes 64-bitEBFE11/27/12 11:17 PM
                            New Article: ARM Goes 64-bitGabriele Svelto11/28/12 02:32 AM
                        New Article: ARM Goes 64-bitEduardoS11/26/12 12:16 PM
                          New Article: ARM Goes 64-bitEBFE11/28/12 12:33 AM
                            New Article: ARM Goes 64-bitEduardoS11/28/12 05:53 AM
                              New Article: ARM Goes 64-bitMichael S11/28/12 06:15 AM
                                New Article: ARM Goes 64-bitEduardoS11/28/12 07:33 AM
                                  New Article: ARM Goes 64-bitMichael S11/28/12 09:16 AM
                                    New Article: ARM Goes 64-bitEduardoS11/28/12 09:53 AM
                                    New Article: ARM Goes 64-bitEugene Nalimov11/28/12 05:58 PM
                                      Amazing!EduardoS11/28/12 07:25 PM
                                        Amazing! (non-italic response)EduardoS11/28/12 07:25 PM
                                        Amazing!EBFE11/28/12 08:20 PM
                                          Undefined behaviour doubles downEduardoS11/28/12 09:10 PM
                              New Article: ARM Goes 64-bitEBFE11/28/12 07:54 PM
                                New Article: ARM Goes 64-bitEduardoS11/28/12 09:21 PM
                Have you heard of Transmeta?David Kanter11/19/12 03:47 PM
            New Article: ARM Goes 64-bitbakaneko11/19/12 09:08 AM
            New Article: ARM Goes 64-bitDavid Kanter11/19/12 03:40 PM
              Semantic Dictionary EncodingRay11/19/12 10:37 PM
              New Article: ARM Goes 64-bitRohit11/20/12 04:48 PM
                New Article: ARM Goes 64-bitDavid Kanter11/20/12 11:07 PM
                  New Article: ARM Goes 64-bitWilco11/21/12 06:41 AM
                    New Article: ARM Goes 64-bitDavid Kanter11/21/12 10:12 AM
                    A JIT exampleMark Roulo11/21/12 10:30 AM
                      A JIT exampleWilco11/21/12 07:04 PM
                        A JIT examplerwessel11/21/12 09:05 PM
                        A JIT exampleGabriele Svelto11/23/12 03:53 AM
                        A JIT exampleEduardoS11/23/12 10:13 AM
                          A JIT exampleWilco11/23/12 01:41 PM
                            A JIT exampleEduardoS11/23/12 02:06 PM
                            A JIT exampleGabriele Svelto11/23/12 04:09 PM
                              A JIT exampleSymmetry11/26/12 05:58 AM
            New Article: ARM Goes 64-bitRay11/19/12 10:27 PM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 09:11 AM
  v7-M is Thumb-onlyPaul A. Clayton08/14/12 06:58 AM
  Minor suggested correctionPaul A. Clayton08/14/12 08:33 AM
    Minor suggested correctionanon08/14/12 08:57 AM
  New Article: ARM Goes 64-bitExophase08/14/12 08:33 AM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 09:16 AM
      New Article: ARM Goes 64-bitjigal08/15/12 01:49 PM
  Correction re ARM and BBC MicroPaul08/14/12 08:59 PM
    Correction re ARM and BBC MicroPer Hesselgren08/15/12 03:27 AM
  Memory BW so lowPer Hesselgren08/15/12 03:14 AM
    Memory BW so lownone08/15/12 11:16 AM
  New Article: ARM Goes 64-bitdado08/15/12 10:25 AM
  Number of GPRsKenneth Jonsson08/16/12 02:35 PM
    Number of GPRsExophase08/16/12 02:52 PM
      Number of GPRsKenneth Jonsson08/17/12 02:41 AM
        Ooops, missing link...Kenneth Jonsson08/17/12 02:44 AM
        64-bit pointers eat some performancePaul A. Clayton08/17/12 06:19 AM
          64-bit pointers eat some performancebakaneko08/17/12 08:37 AM
            Brute force seems to workPaul A. Clayton08/17/12 10:08 AM
              Brute force seems to workbakaneko08/17/12 11:15 AM
          64-bit pointers eat some performanceRichard Cownie08/17/12 08:46 AM
            Pointer compression is atypicalPaul A. Clayton08/17/12 10:43 AM
              Pointer compression is atypicalRichard Cownie08/17/12 12:57 PM
                Pointer compression is atypicalHoward Chu08/22/12 10:17 PM
                  Pointer compression is atypicalRichard Cownie08/23/12 04:48 AM
                    Pointer compression is atypicalHoward Chu08/23/12 06:51 AM
              Pointer compression is atypicalWilco08/17/12 02:41 PM
                Pointer compression is atypicalRichard Cownie08/17/12 04:13 PM
                  Pointer compression is atypicalRicardo B08/19/12 10:44 AM
                  Pointer compression is atypicalHoward Chu08/22/12 10:08 PM
                    Unified libraries?Paul A. Clayton08/23/12 07:49 AM
                    Pointer compression is atypicalRichard Cownie08/23/12 08:44 AM
                      Pointer compression is atypicalHoward Chu08/23/12 05:17 PM
                        Pointer compression is atypicalanon08/23/12 08:15 PM
                          Pointer compression is atypicalHoward Chu08/23/12 09:33 PM
            64-bit pointers eat some performanceFoo_08/18/12 12:09 PM
              64-bit pointers eat some performanceRichard Cownie08/18/12 05:25 PM
                64-bit pointers eat some performanceRichard Cownie08/18/12 05:32 PM
            Page-related benefit of small pointersPaul A. Clayton08/23/12 08:36 AM
        Number of GPRsWilco08/17/12 06:31 AM
          Number of GPRsKenneth Jonsson08/17/12 11:54 AM
            Number of GPRsExophase08/17/12 12:44 PM
              Number of GPRsKenneth Jonsson08/17/12 01:22 PM
                Number of GPRsWilco08/17/12 02:53 PM
        What about dynamic utilization?Exophase08/17/12 09:30 AM
          Compiler vs. assembly aliasing knowledge?Paul A. Clayton08/17/12 10:20 AM
            Compiler vs. assembly aliasing knowledge?Exophase08/17/12 11:09 AM
            Compiler vs. assembly aliasing knowledge?anon08/18/12 02:23 AM
              Compiler vs. assembly aliasing knowledge?Ricardo B08/19/12 11:02 AM
                Compiler vs. assembly aliasing knowledge?anon08/19/12 06:07 PM
                  Compiler vs. assembly aliasing knowledge?Ricardo B08/19/12 07:26 PM
                    Compiler vs. assembly aliasing knowledge?anon08/19/12 10:03 PM
                      Compiler vs. assembly aliasing knowledge?anon08/20/12 01:59 AM
        Number of GPRsDavid Kanter08/17/12 12:46 PM
          RAT issues as part of reason 1Paul A. Clayton08/17/12 02:18 PM
        Number of GPRsname9911/17/12 06:37 PM
          Large ARFs increase renaming costPaul A. Clayton11/17/12 09:23 PM
    Number of GPRsDavid Kanter08/16/12 03:31 PM
    Number of GPRsRichard Cownie08/16/12 05:17 PM
    32 GPRs ~2-3%Paul A. Clayton08/16/12 06:27 PM
      Oops, Message-ID: aaed6e38-c7bd-467e-ba41-f40cf1020e5e@googlegroups.com (NT)Paul A. Clayton08/16/12 06:29 PM
      32 GPRs ~2-3%Exophase08/16/12 10:06 PM
        R31 as SP/zero is kind of neat (NT)Paul A. Clayton08/17/12 06:23 AM
        32 GPRs ~2-3%rwessel08/17/12 08:24 AM
          32 GPRs ~2-3%Exophase08/17/12 09:16 AM
            32 GPRs ~2-3%Max08/17/12 04:19 PM
      32 GPRs ~2-3%name9911/17/12 07:43 PM
    Number of GPRsmpx08/17/12 01:11 AM
      Latency and powerPaul A. Clayton08/17/12 06:54 AM
    Number of GPRsbakaneko08/17/12 03:09 AM
  New Article: ARM Goes 64-bitSteve08/17/12 02:12 PM
    New Article: ARM Goes 64-bitDavid Kanter08/19/12 12:42 PM
      New Article: ARM Goes 64-bitDoug S08/19/12 02:02 PM
      New Article: ARM Goes 64-bitAnon08/19/12 07:16 PM
      New Article: ARM Goes 64-bitSteve08/30/12 07:51 AM
  Scalar vs Vector registersRobert David Graham08/19/12 05:19 PM
    Scalar vs Vector registersDavid Kanter08/19/12 05:29 PM
  New Article: ARM Goes 64-bitBaserock ARM servers08/21/12 04:13 PM
    Baserock ARM serversSysanon08/21/12 04:14 PM
    A-15 virtualization and LPAE?Paul A. Clayton08/21/12 06:13 PM
      A-15 virtualization and LPAE?Anon08/21/12 07:13 PM
        Half-depth advantages?Paul A. Clayton08/21/12 08:42 PM
          Half-depth advantages?Anon08/22/12 03:33 PM
            Thanks for the information (NT)Paul A. Clayton08/22/12 04:04 PM
      A-15 virtualization and LPAE?C. Ladisch08/23/12 11:12 AM
        A-15 virtualization and LPAE?Paul08/23/12 03:17 PM
        Excessive pessimismPaul A. Clayton08/23/12 04:08 PM
          Excessive pessimismDavid Kanter08/23/12 05:05 PM
    New Article: ARM Goes 64-bitMichael S08/22/12 07:12 AM
      BTW, Baserock==product, Codethink==company (NT)Paul A. Clayton08/22/12 08:56 AM
  New Article: ARM Goes 64-bitReinoud Zandijk08/21/12 11:27 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?