Microarchitecting a counter register

Article: ARM Goes 64-bit
By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), November 17, 2012 7:37 pm
Room: Moderated Discussions
name99 (name99.delete@this.redheron.com) on November 17, 2012 3:31 pm wrote:
[snip]
> The point is, in ALL these cases, one knows ahead of the time what one is trying to do and why.
> For these uses cases, large pages work, and don't pollute the TLB. But it is fighting their design
> principle to assume that they are just large 4kB pages and can be handled the same way, in particular
> that they can usefully be AUTOMATICALLY managed by the OS --- automatically allocated, automatically
> swapped, automatically aggregated and de-aggregated from smaller pages.

Huge pages in a hierarchical page table system can be relatively easy to de-aggregate. In addition, a TLB could take advantage of aligned page groups within a huge page that was partially de-aggregated, easily increasing peak TLB reach four fold with the addition of only 3 validity bits. (The Speculative TLB proposal also exploits such a "usually matching a huge page" characteristic by speculating that the translation will be the huge page translation.)

If memory-chip-internal copying was supported, the cost of aggregating a huge page could be significantly reduced.


> As for 64kB pages --- bring it on. High time, and well done, ARM. Really
> the only thing I'd like to see in the new ISA that I don't see is
> - support for multiple condition codes, like POWER
> - a dedicated count register for speeding up inner loops, again like POWER.

I can really sympathize; those are features that I really like. However, it is quite possible to define a GPR as a microarchitectural counter. The front-end could track the least significant bits and the number of decrements, somewhat like the stack pointer optimizations introduced to certain x86 implementations. This would allow early branch resolution (like with a specialized count register), but would not add a special purpose register to the ISA itself.

(For large loop bodies, multiple condition registers can serve a similar purpose. For short loop bodies with short iteration counts, it seems that one might actually want a count FIFO where reaching a count of zero automatically causes a move to the next entry in the buffer.)

Along similar lines, with branch on bit set/clear instructions, it would be possible to suggest a specific GPR as a microarchitectural flag register. This would be less effective than microarchitecting a count register since partial register writes would be more involved; but it might extract some of the benefit of multiple condition registers.

(Other registers may also be attractive for implementation with a future file, like global and thread-local-storage pointers.)

> I'm not a compiler guy, but for low level assembly hacking, multiple condition registers
> could be used with value. It always struck me that the compiler guys were not even
> trying to use them, with at least two obvious possibilities not explored:
> - storing bools in conditions registers rather than standard "integer" registers
> - testing conditions (when it makes sense) across basic block boundaries, rather than the
> paradigm of only looking within a single basic block for scheduling opportunities.

I agree. Another trick would be fully unrolling a short loop with small count and unrolling once its outer loop. With the four-bit condition registers in Power, the inner loop could iterate either zero to 15 times or 1 to 16 times using a different condition register for even and odd outer loop iterations. The new value for each condition register could be set at the end of its loop, allowing substantial time for the value to reach the front end. A third condition register could be set to control the outer loop.

I also wish that condition registers could be used for returning error status. While error-based branches are extremely predictable, early resolution would tend to free up resources early.

For classes with limited polymorphism (or dominated by a few cases), early setting of condition registers might be exploited to convert variable jumps into jumps using immediates. (In theory, an ISA could inline a small jump table and use a small value that would fit in a condition register to select the entry.) Multiple condition registers might be used to hold limited class information for multiple objects.

One neat aspect of multi-bit condition registers is that vector comparisons could set a result for all, one, some but not most, none.

Interestingly, Itanium has both predicate registers and a count register.

Unfortunately, front-end evaluation of condition registers seems to have fallen out of favor, perhaps in large part because of deep and wide pipelines (and OoO execution) and little (or no?) software that exploits very early setting of such values.

> As for the dedicated count register, yes, you can get the same effect mostly with branch prediction,
> but this allows you to free up some of those limited branch prediction slots for more useful purposes,
> and gives you a register that lives close to the instruction side of the chip, not the data side, and
> thus a little faster when used for instruction purposes --- witness POWER's use of that register to
> call proc ptrs. I don't know --- maybe that sort of lack of orthogonality is just too much hassle?
> But count register never struck me as a bad idea, unlike some of what's in the POWER ISA.

Itanium has branch registers specifically for earlier resolving of variable jumps.

Unfortunately, special purpose registers tend to be underutilized and constrain implementations in ways that assigning special uses to GPRs do not. A clever use of hint semantics (and possibly some special-purpose instructions using specific GPRs) might be a "better" way to allow implementations to exploit early availability and other specializations for certain cases.

Power (and Itanium) also have the disadvantage that the special purpose registers could not be loaded directly from memory or directly with an immediate value (in the case of a count) or directly from a computation.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: ARM Goes 64-bitDavid Kanter08/14/12 12:04 AM
  New Article: ARM Goes 64-bitnone08/14/12 12:44 AM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 01:04 AM
    MIPS MT-ASEPaul A. Clayton08/14/12 09:01 AM
      MONITOR/MWAITEduardoS08/14/12 10:08 AM
        MWAIT not specifically MTPaul A. Clayton08/14/12 10:36 AM
          MWAIT not specifically MTEduardoS08/15/12 03:16 PM
        MONITOR/MWAITanonymou508/14/12 11:07 AM
          MONITOR/MWAITEduardoS08/15/12 03:20 PM
      MIPS MT-ASErwessel08/14/12 10:14 AM
  New Article: ARM Goes 64-bitSHK08/14/12 02:01 AM
  New Article: ARM Goes 64-bitanon08/14/12 02:37 AM
    New Article: ARM Goes 64-bitRichard Cownie08/14/12 03:57 AM
      New Article: ARM Goes 64-bitanon08/14/12 04:29 AM
      New Article: ARM Goes 64-bitnone08/14/12 04:44 AM
        New Article: ARM Goes 64-bitanon08/14/12 05:28 AM
          New Article: ARM Goes 64-bitanon08/14/12 05:32 AM
            New Article: ARM Goes 64-bitEduardoS08/14/12 06:06 AM
          New Article: ARM Goes 64-bitnone08/14/12 05:40 AM
            AArch64 select better than cmovPaul A. Clayton08/14/12 06:08 AM
            New Article: ARM Goes 64-bitanon08/14/12 06:12 AM
              New Article: ARM Goes 64-bitnone08/14/12 06:25 AM
                Predicated ld/store are usefulPaul A. Clayton08/14/12 06:48 AM
                  Predicated ld/store are usefulnone08/14/12 06:56 AM
                    Predicated ld/store are usefulanon08/14/12 07:07 AM
                    Predicated stores might not be that badPaul A. Clayton08/14/12 07:27 AM
                      Predicated stores might not be that badDavid Kanter08/15/12 01:14 AM
                        Predicated stores might not be that badMichael S08/15/12 11:41 AM
                        Predicated stores might not be that badR Byron08/17/12 04:09 AM
                New Article: ARM Goes 64-bitanon08/14/12 06:54 AM
                  New Article: ARM Goes 64-bitnone08/14/12 07:04 AM
                    New Article: ARM Goes 64-bitanon08/14/12 07:43 AM
          New Article: ARM Goes 64-bitEduardoS08/14/12 06:07 AM
            New Article: ARM Goes 64-bitanon08/14/12 06:20 AM
              New Article: ARM Goes 64-bitnone08/14/12 06:29 AM
                New Article: ARM Goes 64-bitanon08/14/12 07:00 AM
            New Article: ARM Goes 64-bitMichael S08/14/12 03:43 PM
        New Article: ARM Goes 64-bitRichard Cownie08/14/12 06:53 AM
          OT: Conrad's "Youth"Richard Cownie08/14/12 07:20 AM
      New Article: ARM Goes 64-bitEduardoS08/14/12 06:04 AM
        New Article: ARM Goes 64-bitmpx08/14/12 08:59 AM
          New Article: ARM Goes 64-bitAntti-Ville Tuunainen08/14/12 09:16 AM
        New Article: ARM Goes 64-bitanonymou508/14/12 11:03 AM
          New Article: ARM Goes 64-bitname9911/17/12 03:31 PM
            Microarchitecting a counter registerPaul A. Clayton11/17/12 07:37 PM
    New Article: ARM Goes 64-bitbakaneko08/14/12 04:21 AM
      New Article: ARM Goes 64-bitname9911/17/12 03:40 PM
        New Article: ARM Goes 64-bitEduardoS11/17/12 04:52 PM
        New Article: ARM Goes 64-bitDoug S11/17/12 05:48 PM
        New Article: ARM Goes 64-bitbakaneko11/18/12 05:40 PM
          New Article: ARM Goes 64-bitWilco11/19/12 07:59 AM
            New Article: ARM Goes 64-bitEduardoS11/19/12 08:23 AM
              New Article: ARM Goes 64-bitWilco11/19/12 09:31 AM
                Downloading µarch-specific binaries?Paul A. Clayton11/19/12 11:21 AM
                New Article: ARM Goes 64-bitEduardoS11/19/12 11:41 AM
                  New Article: ARM Goes 64-bitWilco11/21/12 07:44 AM
                    JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)VMguy11/22/12 03:21 AM
                      JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)David Kanter11/22/12 12:12 PM
                        JIT vs. static compilation (Was: New Article: ARM Goes 64-bit)Gabriele Svelto11/23/12 03:50 AM
                    New Article: ARM Goes 64-bitEduardoS11/23/12 10:09 AM
                      New Article: ARM Goes 64-bitEBFE11/26/12 01:24 AM
                        New Article: ARM Goes 64-bitGabriele Svelto11/26/12 03:33 AM
                          New Article: ARM Goes 64-bitEBFE11/27/12 11:17 PM
                            New Article: ARM Goes 64-bitGabriele Svelto11/28/12 02:32 AM
                        New Article: ARM Goes 64-bitEduardoS11/26/12 12:16 PM
                          New Article: ARM Goes 64-bitEBFE11/28/12 12:33 AM
                            New Article: ARM Goes 64-bitEduardoS11/28/12 05:53 AM
                              New Article: ARM Goes 64-bitMichael S11/28/12 06:15 AM
                                New Article: ARM Goes 64-bitEduardoS11/28/12 07:33 AM
                                  New Article: ARM Goes 64-bitMichael S11/28/12 09:16 AM
                                    New Article: ARM Goes 64-bitEduardoS11/28/12 09:53 AM
                                    New Article: ARM Goes 64-bitEugene Nalimov11/28/12 05:58 PM
                                      Amazing!EduardoS11/28/12 07:25 PM
                                        Amazing! (non-italic response)EduardoS11/28/12 07:25 PM
                                        Amazing!EBFE11/28/12 08:20 PM
                                          Undefined behaviour doubles downEduardoS11/28/12 09:10 PM
                              New Article: ARM Goes 64-bitEBFE11/28/12 07:54 PM
                                New Article: ARM Goes 64-bitEduardoS11/28/12 09:21 PM
                Have you heard of Transmeta?David Kanter11/19/12 03:47 PM
            New Article: ARM Goes 64-bitbakaneko11/19/12 09:08 AM
            New Article: ARM Goes 64-bitDavid Kanter11/19/12 03:40 PM
              Semantic Dictionary EncodingRay11/19/12 10:37 PM
              New Article: ARM Goes 64-bitRohit11/20/12 04:48 PM
                New Article: ARM Goes 64-bitDavid Kanter11/20/12 11:07 PM
                  New Article: ARM Goes 64-bitWilco11/21/12 06:41 AM
                    New Article: ARM Goes 64-bitDavid Kanter11/21/12 10:12 AM
                    A JIT exampleMark Roulo11/21/12 10:30 AM
                      A JIT exampleWilco11/21/12 07:04 PM
                        A JIT examplerwessel11/21/12 09:05 PM
                        A JIT exampleGabriele Svelto11/23/12 03:53 AM
                        A JIT exampleEduardoS11/23/12 10:13 AM
                          A JIT exampleWilco11/23/12 01:41 PM
                            A JIT exampleEduardoS11/23/12 02:06 PM
                            A JIT exampleGabriele Svelto11/23/12 04:09 PM
                              A JIT exampleSymmetry11/26/12 05:58 AM
            New Article: ARM Goes 64-bitRay11/19/12 10:27 PM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 09:11 AM
  v7-M is Thumb-onlyPaul A. Clayton08/14/12 06:58 AM
  Minor suggested correctionPaul A. Clayton08/14/12 08:33 AM
    Minor suggested correctionanon08/14/12 08:57 AM
  New Article: ARM Goes 64-bitExophase08/14/12 08:33 AM
    New Article: ARM Goes 64-bitDavid Kanter08/14/12 09:16 AM
      New Article: ARM Goes 64-bitjigal08/15/12 01:49 PM
  Correction re ARM and BBC MicroPaul08/14/12 08:59 PM
    Correction re ARM and BBC MicroPer Hesselgren08/15/12 03:27 AM
  Memory BW so lowPer Hesselgren08/15/12 03:14 AM
    Memory BW so lownone08/15/12 11:16 AM
  New Article: ARM Goes 64-bitdado08/15/12 10:25 AM
  Number of GPRsKenneth Jonsson08/16/12 02:35 PM
    Number of GPRsExophase08/16/12 02:52 PM
      Number of GPRsKenneth Jonsson08/17/12 02:41 AM
        Ooops, missing link...Kenneth Jonsson08/17/12 02:44 AM
        64-bit pointers eat some performancePaul A. Clayton08/17/12 06:19 AM
          64-bit pointers eat some performancebakaneko08/17/12 08:37 AM
            Brute force seems to workPaul A. Clayton08/17/12 10:08 AM
              Brute force seems to workbakaneko08/17/12 11:15 AM
          64-bit pointers eat some performanceRichard Cownie08/17/12 08:46 AM
            Pointer compression is atypicalPaul A. Clayton08/17/12 10:43 AM
              Pointer compression is atypicalRichard Cownie08/17/12 12:57 PM
                Pointer compression is atypicalHoward Chu08/22/12 10:17 PM
                  Pointer compression is atypicalRichard Cownie08/23/12 04:48 AM
                    Pointer compression is atypicalHoward Chu08/23/12 06:51 AM
              Pointer compression is atypicalWilco08/17/12 02:41 PM
                Pointer compression is atypicalRichard Cownie08/17/12 04:13 PM
                  Pointer compression is atypicalRicardo B08/19/12 10:44 AM
                  Pointer compression is atypicalHoward Chu08/22/12 10:08 PM
                    Unified libraries?Paul A. Clayton08/23/12 07:49 AM
                    Pointer compression is atypicalRichard Cownie08/23/12 08:44 AM
                      Pointer compression is atypicalHoward Chu08/23/12 05:17 PM
                        Pointer compression is atypicalanon08/23/12 08:15 PM
                          Pointer compression is atypicalHoward Chu08/23/12 09:33 PM
            64-bit pointers eat some performanceFoo_08/18/12 12:09 PM
              64-bit pointers eat some performanceRichard Cownie08/18/12 05:25 PM
                64-bit pointers eat some performanceRichard Cownie08/18/12 05:32 PM
            Page-related benefit of small pointersPaul A. Clayton08/23/12 08:36 AM
        Number of GPRsWilco08/17/12 06:31 AM
          Number of GPRsKenneth Jonsson08/17/12 11:54 AM
            Number of GPRsExophase08/17/12 12:44 PM
              Number of GPRsKenneth Jonsson08/17/12 01:22 PM
                Number of GPRsWilco08/17/12 02:53 PM
        What about dynamic utilization?Exophase08/17/12 09:30 AM
          Compiler vs. assembly aliasing knowledge?Paul A. Clayton08/17/12 10:20 AM
            Compiler vs. assembly aliasing knowledge?Exophase08/17/12 11:09 AM
            Compiler vs. assembly aliasing knowledge?anon08/18/12 02:23 AM
              Compiler vs. assembly aliasing knowledge?Ricardo B08/19/12 11:02 AM
                Compiler vs. assembly aliasing knowledge?anon08/19/12 06:07 PM
                  Compiler vs. assembly aliasing knowledge?Ricardo B08/19/12 07:26 PM
                    Compiler vs. assembly aliasing knowledge?anon08/19/12 10:03 PM
                      Compiler vs. assembly aliasing knowledge?anon08/20/12 01:59 AM
        Number of GPRsDavid Kanter08/17/12 12:46 PM
          RAT issues as part of reason 1Paul A. Clayton08/17/12 02:18 PM
        Number of GPRsname9911/17/12 06:37 PM
          Large ARFs increase renaming costPaul A. Clayton11/17/12 09:23 PM
    Number of GPRsDavid Kanter08/16/12 03:31 PM
    Number of GPRsRichard Cownie08/16/12 05:17 PM
    32 GPRs ~2-3%Paul A. Clayton08/16/12 06:27 PM
      Oops, Message-ID: aaed6e38-c7bd-467e-ba41-f40cf1020e5e@googlegroups.com (NT)Paul A. Clayton08/16/12 06:29 PM
      32 GPRs ~2-3%Exophase08/16/12 10:06 PM
        R31 as SP/zero is kind of neat (NT)Paul A. Clayton08/17/12 06:23 AM
        32 GPRs ~2-3%rwessel08/17/12 08:24 AM
          32 GPRs ~2-3%Exophase08/17/12 09:16 AM
            32 GPRs ~2-3%Max08/17/12 04:19 PM
      32 GPRs ~2-3%name9911/17/12 07:43 PM
    Number of GPRsmpx08/17/12 01:11 AM
      Latency and powerPaul A. Clayton08/17/12 06:54 AM
    Number of GPRsbakaneko08/17/12 03:09 AM
  New Article: ARM Goes 64-bitSteve08/17/12 02:12 PM
    New Article: ARM Goes 64-bitDavid Kanter08/19/12 12:42 PM
      New Article: ARM Goes 64-bitDoug S08/19/12 02:02 PM
      New Article: ARM Goes 64-bitAnon08/19/12 07:16 PM
      New Article: ARM Goes 64-bitSteve08/30/12 07:51 AM
  Scalar vs Vector registersRobert David Graham08/19/12 05:19 PM
    Scalar vs Vector registersDavid Kanter08/19/12 05:29 PM
  New Article: ARM Goes 64-bitBaserock ARM servers08/21/12 04:13 PM
    Baserock ARM serversSysanon08/21/12 04:14 PM
    A-15 virtualization and LPAE?Paul A. Clayton08/21/12 06:13 PM
      A-15 virtualization and LPAE?Anon08/21/12 07:13 PM
        Half-depth advantages?Paul A. Clayton08/21/12 08:42 PM
          Half-depth advantages?Anon08/22/12 03:33 PM
            Thanks for the information (NT)Paul A. Clayton08/22/12 04:04 PM
      A-15 virtualization and LPAE?C. Ladisch08/23/12 11:12 AM
        A-15 virtualization and LPAE?Paul08/23/12 03:17 PM
        Excessive pessimismPaul A. Clayton08/23/12 04:08 PM
          Excessive pessimismDavid Kanter08/23/12 05:05 PM
    New Article: ARM Goes 64-bitMichael S08/22/12 07:12 AM
      BTW, Baserock==product, Codethink==company (NT)Paul A. Clayton08/22/12 08:56 AM
  New Article: ARM Goes 64-bitReinoud Zandijk08/21/12 11:27 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?