By: Brett (ggtgp.delete@this.yahoo.com), May 6, 2017 12:36 am
Room: Moderated Discussions
Heikki kultala (heikki.kultala.delete@this.tut.fi) on April 26, 2017 1:42 am wrote:
> wumpus (lost.delete@this.in-a.cave.net) on April 25, 2017 7:48 am wrote:
> > Brett (ggtgp.delete@this.yahoo.com) on April 23, 2017 7:52 pm wrote:
> > > Brett (ggtgp.delete@this.yahoo.com) on April 15, 2017 12:48 pm wrote:
> > > > Megol (golem960.delete@this.gmail.com) on April 15, 2017 6:22 am wrote:
> > > > > RichardC (tich.delete@this.pobox.com) on April 13, 2017 6:20 am wrote:
> > > > > > Megol (golem960.delete@this.gmail.com) on April 13, 2017 5:21 am wrote:
> >
> > >
> > > I have been talking of micro threads of three opcodes, as generated by LLVM.
> > > A three opcode chain only generates one result, and only needs four
> > > sources, compared to six sources and three results for RISC/x86.
> > [...]
> > > We are well into the 21st century and CPU design is still stuck in 1980 RISC thinking.
> >
> > I've suggested a similar thing and have been told that it already exists in fused instructions (you
> > might suggest arbitrary waits in your "microthreads", but it really looks like they would work better
> > as a single macro-instruction). Oddly enough, plenty of machines have this capability but can't
> > manage to simplify their scheduling (and other OoO costs) as it is used relatively rarely.
> >
> > Other thoughts about microthreads:
> > - the name is goofy and sounds like it involves massive overhead for the CPU. I'd rather have
> > a macro-instruction or better yet stay with the industry and call it "fused instructions".
>
> I have been thinking about name "ESIC : Explicitly Serial Instruction Computer".
>
> > - temp registers [inside of your microthreads] can be easily solved by assigning each
> > thread to an ALU (or group of ALUs) and each having a [fixed] local set of registers (so
> > that two separate microthreads accessing "register 3" won't interfere). All your RAT
> > issues go away [for internal use, not so much for external use] with such a system.
> > - start with the exception model. If your microthreads can cause exceptions willy-nilly and require
> > careful restart at specific points, you are back in pre-RISC CISC hell, and in no way a post-RISC
> > system. The temp register scheme above relies on avoiding exception hell (put your "exceptable
> > instructions" last, so that if they have to be restarted you can use that known point).
>
> Making all side-effects (including memory writes and register write) happen only after the last instruction of
> a bundle has been executed solves this. If a bundle is not fully executed, it is restated from beginning.
>
> > - single output seems a bit forced and might limit the amount of instructions in a microthread
> > unnecessarily. Are you following pointer chains (see exceptions for why not in the same microthread)?
>
> single outputs allows this to solve the register renaming overhead, and also limits the number
> of requited register write ports nicely, and allows nice way of bundling those few instructions
> to nice easily-decodable packets where the header already contains the destination register.
>
> > Yes the cost is high, but you need a certain amount of instructions per microthread and fusing
> > (which presumably has a single output) is being used and isn't doing what you want.
>
> Bundles of size 2-4 already can give huge decrease in register file writes(and so number of required
> register renames), and very few cods can take advantage of longer ones without neede to have load
> in the middle of something else that breaks the original idea of "atomic execution".
>
> I've been thinking about 16-bit packet header, containing the form/size of the packet, first
> input register, destination register, and 16-bit instructions. Each instruction contains opcode
> and another source register. one value is always bypassed between instructions. though there
> might be few different "forms" of packet, like 3 consecutive instructions and 3 as v-pattern,
> where both inputs of the last are bypassed and the operand filed of the last is "borrowed" to
> one of the first instructions. And register-to-register moves are just 0-instruction packets.
I went down that path two years ago, the code density is poor compared to RISC, and the decode while not as bad as x86, is not pretty.
You could try 64bit packets with variable numbers of registers, and opcode counts, and constants.
> wumpus (lost.delete@this.in-a.cave.net) on April 25, 2017 7:48 am wrote:
> > Brett (ggtgp.delete@this.yahoo.com) on April 23, 2017 7:52 pm wrote:
> > > Brett (ggtgp.delete@this.yahoo.com) on April 15, 2017 12:48 pm wrote:
> > > > Megol (golem960.delete@this.gmail.com) on April 15, 2017 6:22 am wrote:
> > > > > RichardC (tich.delete@this.pobox.com) on April 13, 2017 6:20 am wrote:
> > > > > > Megol (golem960.delete@this.gmail.com) on April 13, 2017 5:21 am wrote:
> >
> > >
> > > I have been talking of micro threads of three opcodes, as generated by LLVM.
> > > A three opcode chain only generates one result, and only needs four
> > > sources, compared to six sources and three results for RISC/x86.
> > [...]
> > > We are well into the 21st century and CPU design is still stuck in 1980 RISC thinking.
> >
> > I've suggested a similar thing and have been told that it already exists in fused instructions (you
> > might suggest arbitrary waits in your "microthreads", but it really looks like they would work better
> > as a single macro-instruction). Oddly enough, plenty of machines have this capability but can't
> > manage to simplify their scheduling (and other OoO costs) as it is used relatively rarely.
> >
> > Other thoughts about microthreads:
> > - the name is goofy and sounds like it involves massive overhead for the CPU. I'd rather have
> > a macro-instruction or better yet stay with the industry and call it "fused instructions".
>
> I have been thinking about name "ESIC : Explicitly Serial Instruction Computer".
>
> > - temp registers [inside of your microthreads] can be easily solved by assigning each
> > thread to an ALU (or group of ALUs) and each having a [fixed] local set of registers (so
> > that two separate microthreads accessing "register 3" won't interfere). All your RAT
> > issues go away [for internal use, not so much for external use] with such a system.
> > - start with the exception model. If your microthreads can cause exceptions willy-nilly and require
> > careful restart at specific points, you are back in pre-RISC CISC hell, and in no way a post-RISC
> > system. The temp register scheme above relies on avoiding exception hell (put your "exceptable
> > instructions" last, so that if they have to be restarted you can use that known point).
>
> Making all side-effects (including memory writes and register write) happen only after the last instruction of
> a bundle has been executed solves this. If a bundle is not fully executed, it is restated from beginning.
>
> > - single output seems a bit forced and might limit the amount of instructions in a microthread
> > unnecessarily. Are you following pointer chains (see exceptions for why not in the same microthread)?
>
> single outputs allows this to solve the register renaming overhead, and also limits the number
> of requited register write ports nicely, and allows nice way of bundling those few instructions
> to nice easily-decodable packets where the header already contains the destination register.
>
> > Yes the cost is high, but you need a certain amount of instructions per microthread and fusing
> > (which presumably has a single output) is being used and isn't doing what you want.
>
> Bundles of size 2-4 already can give huge decrease in register file writes(and so number of required
> register renames), and very few cods can take advantage of longer ones without neede to have load
> in the middle of something else that breaks the original idea of "atomic execution".
>
> I've been thinking about 16-bit packet header, containing the form/size of the packet, first
> input register, destination register, and 16-bit instructions. Each instruction contains opcode
> and another source register. one value is always bypassed between instructions. though there
> might be few different "forms" of packet, like 3 consecutive instructions and 3 as v-pattern,
> where both inputs of the last are bypassed and the operand filed of the last is "borrowed" to
> one of the first instructions. And register-to-register moves are just 0-instruction packets.
I went down that path two years ago, the code density is poor compared to RISC, and the decode while not as bad as x86, is not pretty.
You could try 64bit packets with variable numbers of registers, and opcode counts, and constants.