By: Heikki Kultala (heikki.kultal.a.delete@this.gmail.com), March 22, 2021 5:47 pm
Room: Moderated Discussions
Etienne Lorrain (etienne_lorrain.delete@this.yahoo.fr) on March 22, 2021 3:22 am wrote:
> Like other have pointed to, microthread could be used, I would
> propose very simple form of it, without much modification.
> I mean, usual software only have an IPC in between 0.5 and 2, far lower than the theoretical
> maximum. Even hyper-threading cores seem to quite often have the two threads waiting.
> IHMO we could have an explicit, hardware supported, background microthread. The OoO CPU would
> reserve around 20 in-flight instructions for it (out of its 100 in-flight instructions).
Modern CPUs can have more like 170-400, not 100 in-flight instructions.
> So if the main program flow is stopped (waiting for memory reads), the background microthread would be run.
Waiting for a memory read does not immediately stall a whole out-of-order core. It can keep fetching instructions, and among those instructions it may find some that have no dedpdencies form the stalling instruction.
> I would imagine that "background microthread" being very limited, not able to do system
> call or change rings - but it would be sufficient to do things like pre-zeroing the next
> memory allocation (always running in the memory context of the main application).
This would be insanely complicated for the programmer to use. And would still need synchronization. So not really practical.
> It would be implemented by a single "background program counter" and a set
> of registers (maybe the smallest set), so that a "rep stosb" can be done.
> I think it would also be "optionally run", so the main application would not require anything, just the memory
> allocator would provide pre-zeroed blocks if there were some available, else it would zero those blocks itself.
This is the most insanely complex proposal for zeroing memory I have ever read.
> More complex uses of such background microthread could be found over time, like balancing binary trees.
big problem with synchronization. None of the instructions in flight could not do anything with the tree, or things would break.
> On a more complete redesign of a processor, I would easily "type" the registers of the processor (as I have
> said in old threads), i.e. have assembly instructions to change the type of each registers at run time.
> I imagine processor registers could have types like:
> - simple byte/word/long or any fixed number of bits like all processors
> - pointer to memory (probably 64 bits), maybe with read-only/write-only/read-write
> distinction (would help cache prefetch for read or write at value initialisation)
> - Integer/unsigned which would saturate on overflows, or bitmasks types
> - floating point of fixed number of bits
> - vector of byte/word/long/floats...
> At least it would reduce the number of instructions (addb, addw, addl, ...), clearly define when a register
> is live or dead,
no, it could not tell when an architectural register is live or dead. Register is live after it has been written for the first time. What you are proposing would not help at all to know when it cannot be used anymore.
> and give more context to the core executor. At first one could fix all registers being
> 64 bits to use standard current compilers, and implement other types slowly, one at a time.
Makes no sense. This data would be available only after reading the metadata bit from the register.
We want to be able to fully decode the instruction as early as possible. Requiring to do a register read before finishing decoding would be a big hindurance.
> Like other have pointed to, microthread could be used, I would
> propose very simple form of it, without much modification.
> I mean, usual software only have an IPC in between 0.5 and 2, far lower than the theoretical
> maximum. Even hyper-threading cores seem to quite often have the two threads waiting.
> IHMO we could have an explicit, hardware supported, background microthread. The OoO CPU would
> reserve around 20 in-flight instructions for it (out of its 100 in-flight instructions).
Modern CPUs can have more like 170-400, not 100 in-flight instructions.
> So if the main program flow is stopped (waiting for memory reads), the background microthread would be run.
Waiting for a memory read does not immediately stall a whole out-of-order core. It can keep fetching instructions, and among those instructions it may find some that have no dedpdencies form the stalling instruction.
> I would imagine that "background microthread" being very limited, not able to do system
> call or change rings - but it would be sufficient to do things like pre-zeroing the next
> memory allocation (always running in the memory context of the main application).
This would be insanely complicated for the programmer to use. And would still need synchronization. So not really practical.
> It would be implemented by a single "background program counter" and a set
> of registers (maybe the smallest set), so that a "rep stosb" can be done.
> I think it would also be "optionally run", so the main application would not require anything, just the memory
> allocator would provide pre-zeroed blocks if there were some available, else it would zero those blocks itself.
This is the most insanely complex proposal for zeroing memory I have ever read.
> More complex uses of such background microthread could be found over time, like balancing binary trees.
big problem with synchronization. None of the instructions in flight could not do anything with the tree, or things would break.
> On a more complete redesign of a processor, I would easily "type" the registers of the processor (as I have
> said in old threads), i.e. have assembly instructions to change the type of each registers at run time.
> I imagine processor registers could have types like:
> - simple byte/word/long or any fixed number of bits like all processors
> - pointer to memory (probably 64 bits), maybe with read-only/write-only/read-write
> distinction (would help cache prefetch for read or write at value initialisation)
> - Integer/unsigned which would saturate on overflows, or bitmasks types
> - floating point of fixed number of bits
> - vector of byte/word/long/floats...
> At least it would reduce the number of instructions (addb, addw, addl, ...), clearly define when a register
> is live or dead,
no, it could not tell when an architectural register is live or dead. Register is live after it has been written for the first time. What you are proposing would not help at all to know when it cannot be used anymore.
> and give more context to the core executor. At first one could fix all registers being
> 64 bits to use standard current compilers, and implement other types slowly, one at a time.
Makes no sense. This data would be available only after reading the metadata bit from the register.
We want to be able to fully decode the instruction as early as possible. Requiring to do a register read before finishing decoding would be a big hindurance.