By: Adrian (a.delete@this.acm.org), June 7, 2022 12:59 am
Room: Moderated Discussions
Brett (ggtgp.delete@this.yahoo.com) on June 6, 2022 1:50 pm wrote:
> Adrian (a.delete@this.acm.org) on June 6, 2022 10:59 am wrote:
> > anonymou5 (no.delete@this.spam.com) on June 6, 2022 8:07 am wrote:
> > > > > So yes, Huffman will help for the super popular ones, but overall it
> > > > > really won't: the numbers simply are against you.
> > > >
> > > > The sentence "Huffman will help for the super popular ones, but overall it really won't" is
> > > > a mathematical contradiction. Are you sure you understand how Huffman/Shannon coding works?
> > > >
> > > > Huffman/Shannon codings don't work only in case the premise "there are super popular ones" isn't fulfilled.
> > > > In other words, they do not work if the probability distribution of the characters/symbols in the data
> > > > stream is approximately the same (such as: 8 symbols, probability 1/8=0.125 for each symbol).
> > > >
> > > > -atom
> > >
> > > It's fairly easy to play with this one, no?
> > >
> > > Give yourself a list of ~1-2k entries (depending on your ISA of choice), dial in
> > > a distribution (as described: a few dozen popular ones + an endless long tail),
> > > and see what your encoding choice gives you.
> >
> >
> > For such a case, a Huffman encoding would not be very efficient because it has to
> > round the length of the code for an instructions to an integer number of bits.
> >
> > Nevertheless, for any distribution of the instruction frequencies it is possible to make an
> > optimal encoding resulting in a minimum program size, if the constraint of having an integer
> > number of bits per instruction is no longer imposed, e.g. by using arithmetic coding.
> >
> >
> > Of course, it is unlikely that the reduction of the program size achieved by accepting the length
> > of the instruction to be any integer number of bits or any fractional number of bits, instead
> > of being a multiple of 8 bits, is worth the increase in the complexity of the decoder.
> >
> > Even the improvement from a multiple of 16 bits to a multiple of 8 bits is unlikely to be worthwhile,
> > as shown by the fact that most people are happy enough with the code density of Armv?-M.
> >
> >
> > An ISA which had instruction lengths multiples of 8 bit (it
> > also used a variable-length encoding for immediates
> > and offsets, in 8-bit multiples), and which was claimed
> > to have a much better code density in comparison with
> > all the popular ISAs of eighties, including x86, was the ISA of NS16000, later renamed as NS32000.
> >
> > It might have been interesting to compare some program sizes
> > for NS16000/NS32000 and for Armv?-M or NanoMIPS,
> > to see the difference between 8-bit and 16-bit encodings, but I am not aware of any free compiler for the
> > obsolete NS ISA and translating a program of a reasonable size by hand would be time consuming.
>
> I believe the Renesas RX family is basically x86 recoded for better instruction density,
> this is a modern design and as such best to look at. Plus the other Renesas variants.
>
> https://en.everybodywiki.com/List_of_instruction_sets#Renesas
>
> Today I would add a partial two unit belt as this would save 5 bits off of almost all instructions,
> loads/stores and integer/float operations could all save one register specifier.
>
Thanks for pointing the Renesas RX family.
It has indeed an interesting encoding for its ISA, which is likely to have very good code density.
> Adrian (a.delete@this.acm.org) on June 6, 2022 10:59 am wrote:
> > anonymou5 (no.delete@this.spam.com) on June 6, 2022 8:07 am wrote:
> > > > > So yes, Huffman will help for the super popular ones, but overall it
> > > > > really won't: the numbers simply are against you.
> > > >
> > > > The sentence "Huffman will help for the super popular ones, but overall it really won't" is
> > > > a mathematical contradiction. Are you sure you understand how Huffman/Shannon coding works?
> > > >
> > > > Huffman/Shannon codings don't work only in case the premise "there are super popular ones" isn't fulfilled.
> > > > In other words, they do not work if the probability distribution of the characters/symbols in the data
> > > > stream is approximately the same (such as: 8 symbols, probability 1/8=0.125 for each symbol).
> > > >
> > > > -atom
> > >
> > > It's fairly easy to play with this one, no?
> > >
> > > Give yourself a list of ~1-2k entries (depending on your ISA of choice), dial in
> > > a distribution (as described: a few dozen popular ones + an endless long tail),
> > > and see what your encoding choice gives you.
> >
> >
> > For such a case, a Huffman encoding would not be very efficient because it has to
> > round the length of the code for an instructions to an integer number of bits.
> >
> > Nevertheless, for any distribution of the instruction frequencies it is possible to make an
> > optimal encoding resulting in a minimum program size, if the constraint of having an integer
> > number of bits per instruction is no longer imposed, e.g. by using arithmetic coding.
> >
> >
> > Of course, it is unlikely that the reduction of the program size achieved by accepting the length
> > of the instruction to be any integer number of bits or any fractional number of bits, instead
> > of being a multiple of 8 bits, is worth the increase in the complexity of the decoder.
> >
> > Even the improvement from a multiple of 16 bits to a multiple of 8 bits is unlikely to be worthwhile,
> > as shown by the fact that most people are happy enough with the code density of Armv?-M.
> >
> >
> > An ISA which had instruction lengths multiples of 8 bit (it
> > also used a variable-length encoding for immediates
> > and offsets, in 8-bit multiples), and which was claimed
> > to have a much better code density in comparison with
> > all the popular ISAs of eighties, including x86, was the ISA of NS16000, later renamed as NS32000.
> >
> > It might have been interesting to compare some program sizes
> > for NS16000/NS32000 and for Armv?-M or NanoMIPS,
> > to see the difference between 8-bit and 16-bit encodings, but I am not aware of any free compiler for the
> > obsolete NS ISA and translating a program of a reasonable size by hand would be time consuming.
>
> I believe the Renesas RX family is basically x86 recoded for better instruction density,
> this is a modern design and as such best to look at. Plus the other Renesas variants.
>
> https://en.everybodywiki.com/List_of_instruction_sets#Renesas
>
> Today I would add a partial two unit belt as this would save 5 bits off of almost all instructions,
> loads/stores and integer/float operations could all save one register specifier.
>
Thanks for pointing the Renesas RX family.
It has indeed an interesting encoding for its ISA, which is likely to have very good code density.