By: Michael S (already5chosen.delete@this.yahoo.com), June 7, 2022 2:23 am
Room: Moderated Discussions
Adrian (a.delete@this.acm.org) on June 7, 2022 1:21 am wrote:
> Anon (no.delete@this.spam.com) on June 6, 2022 12:38 pm wrote:
> > Adrian (a.delete@this.acm.org) on June 6, 2022 10:59 am wrote:
> > > Nevertheless, for any distribution of the instruction frequencies it is possible to make an
> > > optimal encoding resulting in a minimum program size, if the constraint of having an integer
> > > number of bits per instruction is no longer imposed, e.g. by using arithmetic coding.
> > >
> > >
> > > Of course, it is unlikely that the reduction of the program size achieved by accepting the length
> > > of the instruction to be any integer number of bits or any fractional number of bits, instead
> > > of being a multiple of 8 bits, is worth the increase in the complexity of the decoder.
> >
> > Would it really be so complex? If the program code is split
> > in, let's say, 128 bits chuncks containing a variable
> > number of arithmetic encoded instructions, would the decoder
> > really be much more complex than a x86 encoder?
> > Multiple chuncks could be decoded in parallel, and the
> > result stored in a uop cache, also, I would expect in
> > each chuck only a small subset of registers being used multiple times, being very compressible.
> >
> > But someone could also have codes for common instruction sequences, getting
> > a benefit similar to arithmetic coding but much simpler to decode.
> >
>
>
>
> "But someone could also have codes for common instruction sequences": yes, I agree.
>
> I believe that this is by far the most practical method of increasing the code
> density, i.e. to add complex instructions to the ISA, but only if they are well
> chosen, based on usage frequency, to be able to influence the code density.
>
>
> While in general I have an extremely poor opinion of the RISC V ISA, which
> I believe to be one of the worst of the more than 100 ISA with which I am
> familiar, the RISC V ISA nonetheless includes a few very good features.
>
> By far the best feature of RISC V are the combined compare-and-branch instructions, even
> if RISC V does not have all the comparison cases that would be needed in a complete ISA.
Nios2 does have all reg-to-reg signed/unsigned comparison cases combined with branch.
According to my code size managements, the improvement over RV is not noticeable.
>
> Because of the very high frequency of the conditional branches and because for almost
> every such branch RISC-V saves a 32-bit word in comparison with AArch64, this allows the
> length of many RISC-V programs to be competitive with that for AArch64, even if RISC-V
> needs a lot of extra 32-bit words for a large part of the load/store instructions.
>
>
> In my opinion, the easiest way to increase the code density of AArch64 would be to define a set
> of compare-and-branch/test-under-mask-and-branch/branch-on-count/branch-on-index instructions.
>
> I have verified that there is enough free encoding space in the AArch64 branch instruction
> block, to allow the encoding of all kinds of such instructions that would be needed.
>
> Conditional branches are usually 15% to 20% of all instructions and saving one 32-bit word for most
> of them would cause a larger improvement in code density than almost any other encoding change.
>
My guess is that very significant part of compare|test instructions ahead of branch are of 'with immediate' variety. So, I expect much smaller gain than suggested by your analysis.
In specific case of aarch64, another, probably even bigger, factor that reduces the potential gain is the fact ISA already has CBZ/CBNZ/TBZ/TBNZ.
Generally, I disagree with your conclusion.
2 instruction sizes (16b/32b) is bigger density win than what is possible with very smart choice of combined sequences. 3 instruction sizes (16b/32b/48b or 16b/32b/40b, I'm not close on which one is better) is better yet, but by smaller increment over 2 sizes. The biggest win of 3 sizes is not so much a code density, but potentially better performance of narrow (1 or 2-wide) implementations of the ISA.
> Anon (no.delete@this.spam.com) on June 6, 2022 12:38 pm wrote:
> > Adrian (a.delete@this.acm.org) on June 6, 2022 10:59 am wrote:
> > > Nevertheless, for any distribution of the instruction frequencies it is possible to make an
> > > optimal encoding resulting in a minimum program size, if the constraint of having an integer
> > > number of bits per instruction is no longer imposed, e.g. by using arithmetic coding.
> > >
> > >
> > > Of course, it is unlikely that the reduction of the program size achieved by accepting the length
> > > of the instruction to be any integer number of bits or any fractional number of bits, instead
> > > of being a multiple of 8 bits, is worth the increase in the complexity of the decoder.
> >
> > Would it really be so complex? If the program code is split
> > in, let's say, 128 bits chuncks containing a variable
> > number of arithmetic encoded instructions, would the decoder
> > really be much more complex than a x86 encoder?
> > Multiple chuncks could be decoded in parallel, and the
> > result stored in a uop cache, also, I would expect in
> > each chuck only a small subset of registers being used multiple times, being very compressible.
> >
> > But someone could also have codes for common instruction sequences, getting
> > a benefit similar to arithmetic coding but much simpler to decode.
> >
>
>
>
> "But someone could also have codes for common instruction sequences": yes, I agree.
>
> I believe that this is by far the most practical method of increasing the code
> density, i.e. to add complex instructions to the ISA, but only if they are well
> chosen, based on usage frequency, to be able to influence the code density.
>
>
> While in general I have an extremely poor opinion of the RISC V ISA, which
> I believe to be one of the worst of the more than 100 ISA with which I am
> familiar, the RISC V ISA nonetheless includes a few very good features.
>
> By far the best feature of RISC V are the combined compare-and-branch instructions, even
> if RISC V does not have all the comparison cases that would be needed in a complete ISA.
Nios2 does have all reg-to-reg signed/unsigned comparison cases combined with branch.
According to my code size managements, the improvement over RV is not noticeable.
>
> Because of the very high frequency of the conditional branches and because for almost
> every such branch RISC-V saves a 32-bit word in comparison with AArch64, this allows the
> length of many RISC-V programs to be competitive with that for AArch64, even if RISC-V
> needs a lot of extra 32-bit words for a large part of the load/store instructions.
>
>
> In my opinion, the easiest way to increase the code density of AArch64 would be to define a set
> of compare-and-branch/test-under-mask-and-branch/branch-on-count/branch-on-index instructions.
>
> I have verified that there is enough free encoding space in the AArch64 branch instruction
> block, to allow the encoding of all kinds of such instructions that would be needed.
>
> Conditional branches are usually 15% to 20% of all instructions and saving one 32-bit word for most
> of them would cause a larger improvement in code density than almost any other encoding change.
>
My guess is that very significant part of compare|test instructions ahead of branch are of 'with immediate' variety. So, I expect much smaller gain than suggested by your analysis.
In specific case of aarch64, another, probably even bigger, factor that reduces the potential gain is the fact ISA already has CBZ/CBNZ/TBZ/TBNZ.
Generally, I disagree with your conclusion.
2 instruction sizes (16b/32b) is bigger density win than what is possible with very smart choice of combined sequences. 3 instruction sizes (16b/32b/48b or 16b/32b/40b, I'm not close on which one is better) is better yet, but by smaller increment over 2 sizes. The biggest win of 3 sizes is not so much a code density, but potentially better performance of narrow (1 or 2-wide) implementations of the ISA.