By: Marcus (m.delete@this.bitsnbites.eu), August 16, 2022 11:07 pm
Room: Moderated Discussions
Brett (ggtgp.delete@this.yahoo.com) on August 16, 2022 6:07 pm wrote:
> Mark Roulo (nothanks.delete@this.xxx.com) on August 16, 2022 4:05 pm wrote:
> > Doug S (foo.delete@this.bar.bar) on August 16, 2022 9:49 am wrote:
> > > rwessel (rwessel.delete@this.yahoo.com) on August 16, 2022 5:21 am wrote:
> > > > A bit of heresy for Tuesday morning:
> > > >
> > > > So we combine multiple instructions to encode a constant, or even a decently long branch, need
> > > > to do instruction fusion for performance, and need compressed ISAs to keep text size down...
> > > >
> > > > Maybe that's a sign that fixed length ISAs are actually a mistake?
> > > >
> > > > Keeping parallel decode costs down is definitely a goal, but I'm
> > > > don't think that actually requires fixed length instructions.
> > >
> > >
> > > While Intel/AMD demonstrate it is possible to attain high performance with a variable length ISA, if you
> > > were going to make that a deliberate choice in a new ISA you'd need a lot more reason than the above.
> >
> > Agreed, but I think the primary lesson from RISC instruction encoding should
> > not be: "All instructions should have the same lengths", but instead:
> >
> > "Finding the next instruction in a sequence should require minimum decoding"
> >
> > All instructions are 4-bytes long is one way to require minimum
> > decoding to find the next instruction in a sequence.
> >
> > All instructions are 2-, 4-, 6- (and maybe 8-) bytes long and the length of a given
> > instruction can be determined from the first few bits of the instruction seems to
> > get the primary benefit of fixed-width while allowing a more compact encoding.
> >
> > I'm sure that folks who do this for a living know how important fixed-length
> > encoding is vs. 'easy-to-find-next-instruction encoding.'
>
> You are fighting the last war.
> The future is 64bit or bigger instructions, and how many operations you can pack in 64 bits.
> You would think that branches every four instructions would cause poor
> instruction density with lots of no-ops packed in… An Itanic issue.
> But there is no reason you can’t pack in the next operation in with the branch.
> Only branch destinations cause issues, and you could jump to an odd address
> to signify you start on the second operation of the instruction.
>
> Insane or not?
You're essentially talking about instruction bundles (as opposed to VLIW), right? I see a few similar suggestions (e.g. by rwessel), and it's an interesting idea.
I'm sure that there are some merits to such a solution. One obvious advantage would be good instruction packing & code density. 16 bits per instruction is too small, even for destructive operands, and 32 bits is too large for destructive operands and just a couple of bits too small for non-destructive operands, etc. Having the option to have more specialized instruction sizes (e.g. 18 bits or 36 bits, etc) could potentially help with encoding efficiency (unless the advantage gets eaten by NOP padding if the bundle is too narrow). Another advantage is that you get very predictable instruction fetch boundaries (similar to AArch64, but even less granular), so it *should* be beneficial for wide decoders (if done right).
There are probably some complications w.r.t. branching and instruction addressing and such (e.g. you would effectively get two different address spaces: one for the byte-addressed instruction data in data memory, and another for the PC). One possibility would be to have the N lower bits of the PC / branch target indicate an instruction within the bundle, assuming that there are at most 2^N instructions in a bundle. Another obvious problem is NOP padding. It feels like the equivalent of delay slots: it may be possible to have the compiler fill the slots by reordering instructions across bundle boundaries, but not 100% of the time.
> Mark Roulo (nothanks.delete@this.xxx.com) on August 16, 2022 4:05 pm wrote:
> > Doug S (foo.delete@this.bar.bar) on August 16, 2022 9:49 am wrote:
> > > rwessel (rwessel.delete@this.yahoo.com) on August 16, 2022 5:21 am wrote:
> > > > A bit of heresy for Tuesday morning:
> > > >
> > > > So we combine multiple instructions to encode a constant, or even a decently long branch, need
> > > > to do instruction fusion for performance, and need compressed ISAs to keep text size down...
> > > >
> > > > Maybe that's a sign that fixed length ISAs are actually a mistake?
> > > >
> > > > Keeping parallel decode costs down is definitely a goal, but I'm
> > > > don't think that actually requires fixed length instructions.
> > >
> > >
> > > While Intel/AMD demonstrate it is possible to attain high performance with a variable length ISA, if you
> > > were going to make that a deliberate choice in a new ISA you'd need a lot more reason than the above.
> >
> > Agreed, but I think the primary lesson from RISC instruction encoding should
> > not be: "All instructions should have the same lengths", but instead:
> >
> > "Finding the next instruction in a sequence should require minimum decoding"
> >
> > All instructions are 4-bytes long is one way to require minimum
> > decoding to find the next instruction in a sequence.
> >
> > All instructions are 2-, 4-, 6- (and maybe 8-) bytes long and the length of a given
> > instruction can be determined from the first few bits of the instruction seems to
> > get the primary benefit of fixed-width while allowing a more compact encoding.
> >
> > I'm sure that folks who do this for a living know how important fixed-length
> > encoding is vs. 'easy-to-find-next-instruction encoding.'
>
> You are fighting the last war.
> The future is 64bit or bigger instructions, and how many operations you can pack in 64 bits.
> You would think that branches every four instructions would cause poor
> instruction density with lots of no-ops packed in… An Itanic issue.
> But there is no reason you can’t pack in the next operation in with the branch.
> Only branch destinations cause issues, and you could jump to an odd address
> to signify you start on the second operation of the instruction.
>
> Insane or not?
You're essentially talking about instruction bundles (as opposed to VLIW), right? I see a few similar suggestions (e.g. by rwessel), and it's an interesting idea.
I'm sure that there are some merits to such a solution. One obvious advantage would be good instruction packing & code density. 16 bits per instruction is too small, even for destructive operands, and 32 bits is too large for destructive operands and just a couple of bits too small for non-destructive operands, etc. Having the option to have more specialized instruction sizes (e.g. 18 bits or 36 bits, etc) could potentially help with encoding efficiency (unless the advantage gets eaten by NOP padding if the bundle is too narrow). Another advantage is that you get very predictable instruction fetch boundaries (similar to AArch64, but even less granular), so it *should* be beneficial for wide decoders (if done right).
There are probably some complications w.r.t. branching and instruction addressing and such (e.g. you would effectively get two different address spaces: one for the byte-addressed instruction data in data memory, and another for the PC). One possibility would be to have the N lower bits of the PC / branch target indicate an instruction within the bundle, assuming that there are at most 2^N instructions in a bundle. Another obvious problem is NOP padding. It feels like the equivalent of delay slots: it may be possible to have the compiler fill the slots by reordering instructions across bundle boundaries, but not 100% of the time.