By: Adrian (a.delete@this.acm.org), June 6, 2022 10:59 am
Room: Moderated Discussions
anonymou5 (no.delete@this.spam.com) on June 6, 2022 8:07 am wrote:
> > > So yes, Huffman will help for the super popular ones, but overall it
> > > really won't: the numbers simply are against you.
> >
> > The sentence "Huffman will help for the super popular ones, but overall it really won't" is
> > a mathematical contradiction. Are you sure you understand how Huffman/Shannon coding works?
> >
> > Huffman/Shannon codings don't work only in case the premise "there are super popular ones" isn't fulfilled.
> > In other words, they do not work if the probability distribution of the characters/symbols in the data
> > stream is approximately the same (such as: 8 symbols, probability 1/8=0.125 for each symbol).
> >
> > -atom
>
> It's fairly easy to play with this one, no?
>
> Give yourself a list of ~1-2k entries (depending on your ISA of choice), dial in
> a distribution (as described: a few dozen popular ones + an endless long tail),
> and see what your encoding choice gives you.
For such a case, a Huffman encoding would not be very efficient because it has to round the length of the code for an instructions to an integer number of bits.
Nevertheless, for any distribution of the instruction frequencies it is possible to make an optimal encoding resulting in a minimum program size, if the constraint of having an integer number of bits per instruction is no longer imposed, e.g. by using arithmetic coding.
Of course, it is unlikely that the reduction of the program size achieved by accepting the length of the instruction to be any integer number of bits or any fractional number of bits, instead of being a multiple of 8 bits, is worth the increase in the complexity of the decoder.
Even the improvement from a multiple of 16 bits to a multiple of 8 bits is unlikely to be worthwhile, as shown by the fact that most people are happy enough with the code density of Armv?-M.
An ISA which had instruction lengths multiples of 8 bit (it also used a variable-length encoding for immediates and offsets, in 8-bit multiples), and which was claimed to have a much better code density in comparison with all the popular ISAs of eighties, including x86, was the ISA of NS16000, later renamed as NS32000.
It might have been interesting to compare some program sizes for NS16000/NS32000 and for Armv?-M or NanoMIPS, to see the difference between 8-bit and 16-bit encodings, but I am not aware of any free compiler for the obsolete NS ISA and translating a program of a reasonable size by hand would be time consuming.
> > > So yes, Huffman will help for the super popular ones, but overall it
> > > really won't: the numbers simply are against you.
> >
> > The sentence "Huffman will help for the super popular ones, but overall it really won't" is
> > a mathematical contradiction. Are you sure you understand how Huffman/Shannon coding works?
> >
> > Huffman/Shannon codings don't work only in case the premise "there are super popular ones" isn't fulfilled.
> > In other words, they do not work if the probability distribution of the characters/symbols in the data
> > stream is approximately the same (such as: 8 symbols, probability 1/8=0.125 for each symbol).
> >
> > -atom
>
> It's fairly easy to play with this one, no?
>
> Give yourself a list of ~1-2k entries (depending on your ISA of choice), dial in
> a distribution (as described: a few dozen popular ones + an endless long tail),
> and see what your encoding choice gives you.
For such a case, a Huffman encoding would not be very efficient because it has to round the length of the code for an instructions to an integer number of bits.
Nevertheless, for any distribution of the instruction frequencies it is possible to make an optimal encoding resulting in a minimum program size, if the constraint of having an integer number of bits per instruction is no longer imposed, e.g. by using arithmetic coding.
Of course, it is unlikely that the reduction of the program size achieved by accepting the length of the instruction to be any integer number of bits or any fractional number of bits, instead of being a multiple of 8 bits, is worth the increase in the complexity of the decoder.
Even the improvement from a multiple of 16 bits to a multiple of 8 bits is unlikely to be worthwhile, as shown by the fact that most people are happy enough with the code density of Armv?-M.
An ISA which had instruction lengths multiples of 8 bit (it also used a variable-length encoding for immediates and offsets, in 8-bit multiples), and which was claimed to have a much better code density in comparison with all the popular ISAs of eighties, including x86, was the ISA of NS16000, later renamed as NS32000.
It might have been interesting to compare some program sizes for NS16000/NS32000 and for Armv?-M or NanoMIPS, to see the difference between 8-bit and 16-bit encodings, but I am not aware of any free compiler for the obsolete NS ISA and translating a program of a reasonable size by hand would be time consuming.