What are your ideas for a radically different CPU ISA + physical Arch?

By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), March 26, 2021 7:21 am
Room: Moderated Discussions
Moritz (better.delete@this.not.tell) on March 20, 2021 5:21 am wrote:
> What if you could completely rethink the general processor concept?

As already noted, this is a huge topic; it deserves a response on the scale of Donald Knuth's The Art of Computer Programming.

The microthread theme of exploiting (at least partially explicit) coarse-grained parallelism (including speculative parallelism) is one obvious point of attraction. (I do not recall anyone mentioning the decoupling of aspects of processing. Where a simple pipeline uses a single-entry buffer, modern pipelines provide larger buffers at various stages. Fetch and schedule are somewhat highly decoupled and data prefetch provides some decoupling of data load. While threading provides such decoupling, there are probably opportunities for decoupling where communication and synchronization/control-flow is still somewhat tightly integrated.)

Coordinating communication between general-purpose processing agents, accelerators, I/O agents, and storage/memory (and the interconnect) seems a significant design consideration. While memory-mapped I/O provides a useful abstraction (particularly for programming in a C-like language), better interfaces can likely be devised. Architecting interrupts as procedure calls from remote agents with arguments seems attractive; such would also provide an interface for inter-thread interrupts and communication. (Along similar lines, most uses of MWAIT would seem to benefit from loading the value in the newly changed memory location; this would not be a substantial benefit, but providing a separate explicit load operation seems wasteful. Note also that this mechanism ties with a thread stalled on a cache miss, which is effectively an MWAIT waiting for value return from memory not from another thread that has yet to store the value.)

Spatial (functional units, clusters, core groups, et al.) and temporal (lifetime) value locality can also be exploited in storage location and access mechanism. Some degree of random access for at least some operands might be avoided, saving area and power (latches vs. register file entries). As with cache coherence, intra-core communication has a broadcast-bias which is overkill for the common case. Many results have one temporally proximate consumer. Something vaguely reminiscent of Transport Triggered Architecture might both reduce communication (integrating such into operand-capture-style dynamic scheduling seems plausible). Partitioning can reduce latency and energy (if local accesses are sufficiently common); while cluster private caches have low utilization and/or replication/storage-waste issues, I am optimistic that software optimization could.

(Diverse criticality of data also seems to be underexploited. Academic papers have proposed cache replacement that take into account prefetchability and criticality for branch misprediction correction and pointer chasing, but even at L1 not all loads are equally urgent. A lax schedule load could use phased tag-data access [way prediction reduces this advantage] or yield its place on bank conflicts.)

There may be opportunities for scheduling and communication optimization from hoisting a set of loads in front of the computation. The cost of staging potentially unused data may not be high in some cases (and there is also a reduced decoupling).

(Cache access width and block size might be exploited for lower energy when nearby members of a structure are often accessed with temporal locality — cf. signature cache. This could also facilitated ECC with less read-modify-write overhead. If a part of an ECC word is read, the whole word could be cached if a modification is likely in the near future. Compile-time metadata might be worth providing for such cases. There may also be cases where L1 cache might act somewhat more like vector registers in caching gather operations for reuse or even just decoupling load [into-L1]/organization from operation.)

Since many resources are shared (even without multithreaded cores, cache capacity and bandwidth is typically shared at some level, memory bandwidth is shared, thermal headroom and power are shared) managing this sharing seems significant. There may be some use for a market-oriented bidding for resources, but monopoly grants may also have a place. The value of a resource to a consumer is not fixed by simple supply and demand but considers relative estimated utility to the use. (The overhead of managing budgets, bids, and other accounting would constrain how extensively market economics could be applied. There are also significant differences between a computer system and human systems.)

Coordinating architecture and microarchitecture applies the principle of work caching (do not put off until decode what can be done at compile-time). Caching all possible work is obviously foolish (e.g., having every instruction encoded in memory as the actual control signals without compression), but I believe a lot of work is unnecessarily redundant. E.g., the loading and storing of return addresses seems unnecessarily redundant with a return address stack predictor and a RAS predictor overflows and may not perfectly handle misspeculation. (Similarly, the redudancy between PC-relative branch BTB entries and code might be reduced with architectural help. This can be done microarchitecturally, but cooperation seems likely to be helpful.)

This is a huge topic; this post has not even explored all the first level of tangents from a few basic concepts.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
What are your ideas for a radically different CPU ISA + physical Arch?Moritz2021/03/20 04:21 AM
  What are your ideas for a radically different CPU ISA + physical Arch?Stanislav Shwartsman2021/03/20 05:22 AM
    I like the analysis of current arch presentedMoritz2021/03/20 09:13 AM
    Did you read this old article?Michael S2021/03/21 01:12 AM
  Deliver programs in IRHugo Décharnes2021/03/20 06:34 AM
    Java bytecode and Wasm exist, why invent something else? (NT)Foo_2021/03/20 07:01 AM
      Java bytecode and Wasm exist, why invent something else?Hugo Décharnes2021/03/20 07:55 AM
        Java bytecode and Wasm exist, why invent something else?Foo_2021/03/20 09:50 AM
          Java bytecode and Wasm exist, why invent something else?Hugo Décharnes2021/03/20 11:40 AM
            Java bytecode and Wasm exist, why invent something else?Foo_2021/03/20 03:54 PM
              It's called source code, no?anonymou52021/03/20 11:43 PM
                It's called source code, no?Foo_2021/03/21 04:07 AM
                Thoughts on software distribution formatsPaul A. Clayton2021/03/22 12:45 PM
    Deliver programs in IRJames2021/03/20 10:24 AM
      Deliver programs in IRHugo Décharnes2021/03/20 11:28 AM
        Deliver programs in IRHugo Décharnes2021/03/20 11:36 AM
    Deliver programs in IRLinus Torvalds2021/03/20 12:20 PM
      Deliver programs in IRHugo Décharnes2021/03/20 12:51 PM
      I'd like to be able to NOT specify order for some things ...Mark Roulo2021/03/20 04:49 PM
        I'd like to be able to NOT specify order for some things ...Jukka Larja2021/03/20 11:26 PM
          NOT (unintentionally) specify orderMoritz2021/03/21 05:00 AM
            NOT (unintentionally) specify orderJukka Larja2021/03/22 06:11 AM
              NOT (unintentionally) specify orderMoritz2021/03/22 11:40 AM
                NOT (unintentionally) specify orderJukka Larja2021/03/23 05:26 AM
          I'd like to be able to NOT specify order for some things ...Mark Roulo2021/03/21 08:47 AM
            I'd like to be able to NOT specify order for some things ...Victor Alander2021/03/21 04:14 PM
      Next architecture will start with MLwumpus2021/03/21 11:24 AM
        Next architecture will start with MLLinus Torvalds2021/03/21 01:38 PM
          Maybe SQL was the better example for general purpose machineswumpus2021/03/22 07:33 AM
            Maybe SQL was the better example for general purpose machinesanon2021/03/22 08:10 AM
        Next architecture will start with MLML will move to PIM2021/03/22 02:51 AM
    Deliver programs in IRanon2021/03/21 02:22 AM
      Deliver programs in IRanon22021/03/21 03:52 AM
        Deliver programs in IRrwessel2021/03/21 04:05 AM
          Deliver programs in IRanon22021/03/21 06:08 PM
            Deliver programs in IRrwessel2021/03/21 09:47 PM
              Deliver programs in IRdmcq2021/03/22 03:33 AM
                Deliver programs in IRrwessel2021/03/22 05:27 AM
  What are your ideas for a radically different CPU ISA + physical Arch?Veedrac2021/03/20 10:27 AM
    Cray MTAanon2021/03/20 05:04 PM
      Cray MTAChester2021/03/20 06:54 PM
        Cray MTAVeedrac2021/03/21 12:33 AM
          Cray MTAnoone2021/03/21 08:15 AM
            Cray MTAVeedrac2021/03/21 09:54 AM
    monolithic 3Dwumpus2021/03/21 11:50 AM
  What are your ideas for a radically different CPU ISA + physical Arch?Anon2021/03/20 11:06 PM
  What are your ideas for a radically different CPU ISA + physical Arch?rwessel2021/03/21 04:02 AM
  What are your ideas for a radically different CPU ISA + physical Arch?juanrga2021/03/21 04:46 AM
  Summery so farMoritz2021/03/21 08:45 AM
    Summery so farrwessel2021/03/21 10:23 AM
      not staticMoritz2021/03/26 09:12 AM
        Dynamic meta instruction encoding for instruction window compressionMoritz2021/03/28 02:28 AM
          redistributing the work between static compiler, dynamic compiler, CPUMoritz2021/04/05 02:21 AM
            redistributing the work between static compiler, dynamic compiler, CPUdmcq2021/04/05 08:27 AM
    Summery so farAnon2021/03/21 07:53 PM
  What are your ideas for a radically different CPU ISA + physical Arch?blaine2021/03/21 09:10 AM
    What are your ideas for a radically different CPU ISA + physical Arch?rwessel2021/03/21 10:26 AM
      What are your ideas for a radically different CPU ISA + physical Arch?rwessel2021/03/21 10:34 AM
        What are your ideas for a radically different CPU ISA + physical Arch?blaine2021/03/21 11:55 AM
          What are your ideas for a radically different CPU ISA + physical Arch?rwessel2021/03/21 12:31 PM
      What are your ideas for a radically different CPU ISA + physical Arch?gallier22021/03/21 11:49 PM
  What are your ideas for a radically different CPU ISA + physical Arch?dmcq2021/03/21 02:50 PM
  Microthread/low IPCEtienne Lorrain2021/03/22 02:22 AM
    Microthread/low IPCdmcq2021/03/22 03:24 AM
      Microthread/low IPCEtienne Lorrain2021/03/22 05:10 AM
        Microthread/low IPCdmcq2021/03/22 07:24 AM
    Microthread/low IPCdmcq2021/03/22 03:53 AM
      Microthread/low IPCEtienne Lorrain2021/03/22 04:46 AM
      Microthread/low IPCAnon2021/03/22 04:47 AM
    Microthread/low IPCHeikki Kultala2021/03/22 04:47 PM
      Microthread/low IPCEtienne Lorrain2021/03/23 02:36 AM
        Microthread/low IPCNyan2021/03/24 02:00 AM
          Microthread/low IPCEtienne Lorrain2021/03/24 03:23 AM
      Microthread/low IPCAnon2021/03/23 07:16 AM
        Microthread/low IPCgai2021/03/23 08:37 AM
          Microthread/low IPCAnon2021/03/23 09:17 AM
            Microthread/low IPCdmcq2021/03/23 11:42 AM
  Have you looked at "The Mill CPU" project? (nt)Anon C2021/03/22 05:21 AM
    Have you looked at "The Mill CPU" project? (nt)Moritz2021/03/22 11:13 AM
      Have you looked at "The Mill CPU" project? (nt)Andrew Clough2021/03/22 03:27 PM
        The Mill = vaporwareRichardC2021/03/23 11:47 AM
          The Mill = vaporwareMichael S2021/03/23 12:58 PM
          The Mill = vaporwareCarson2021/03/23 05:17 PM
          The Mill = doomed but interestingAndrew Clough2021/03/24 07:06 AM
            Solution in search of a problemwumpus2021/03/24 07:52 AM
              Solution in search of a problemdmcq2021/03/24 09:22 AM
          never-ware != vaporware (at least in connotation)Paul A. Clayton2021/03/24 09:37 AM
  What are your ideas for a radically different CPU ISA + physical Arch?anonini2021/03/22 07:28 AM
    microcode that can combine instructionMoritz2021/03/22 11:26 AM
  What are your ideas for a radically different CPU ISA + physical Arch?anony2021/03/22 09:16 AM
    Totally clueless.Heikki Kultala2021/03/22 04:53 PM
  Hierarchical instruction setHeikki Kultala2021/03/22 05:52 PM
    Hierarchical instruction setVeedrac2021/03/23 02:49 AM
      Hierarchical instruction setHeikki Kultala2021/03/23 05:46 AM
        Hierarchical instruction setEtienne Lorrain2021/03/23 06:16 AM
          microthreads on OS call/exceptionHeikki Kultala2021/03/23 06:34 AM
        Hierarchical instruction setVeedrac2021/03/23 08:31 AM
          Hierarchical instruction setEtienne Lorrain2021/03/24 12:13 AM
            Hierarchical instruction setVeedrac2021/03/24 06:11 AM
    Hierarchical instruction setAnon2021/03/23 07:39 AM
  What are your ideas for a radically different CPU ISA + physical Arch?Paul A. Clayton2021/03/26 07:21 AM
    What are your ideas for a radically different CPU ISA + physical Arch?wumpus2021/03/26 08:45 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?