TOP3 blunders

By: Potatoswatter (, September 26, 2009 12:38 pm
? ( on 9/26/09 wrote:
>- Instructions with operands which have arbitrary data layout. For example, suppose
>you have determined that the optimal bit-length you need is 11 bits per input and
>12 bits per output. No problem: you can extract the 5 11-bit numbers from a 64-bit
>register in parallel, do some 5-way computation and then pack the 5 12-bit results into a 64-bit register.

Don't underestimate the work in routing data arbitrarily. You're talking about adding shifters between the register file and arithmetic. That will add a lot of latency and only be useful occasionally, and not much faster than a shift instruction.

PowerPC adds a mask operation to shifts (useful for unpack operations), and it turned out to be a bitch to implement without commensurate utility.

For small-operand addition and multiplication, you can use scalar registers as vectors just by, for example, placing each 12-bit operand in a 24-bit field. So a 64-bit packed word unpacks to two half-empty 64-bit registers. I used this to write a vectorized transparency engine on the PowerPC 601. (Operands being 5-bit color channels.)

Of course this only allows vector + vector addition and vector * scalar mult. But you won't get vector * vector multiplication on an arbitrary number of arbitrary bitfields without a ludicrous amount of circuitry.

>- Let the inputs and outputs of CEs performing floating-point computations have
>3 separate parts per operand: sign, mantissa, exponent. Assuming we have such CEs,
>we can now try using them in video encoding/decoding applications, games and rendering. No freaking GP-GPU necessary.

What? All the SIMD ISAs put FP and integer values in the same registers. You can pack and unpack them to your heart's content. The size of the mantissa and exponent aren't going to change without making the FPU slower.

How do you propose to make this useful for graphics? How does GPGPU relate to graphics? (That would seem a contradiction!)

GPGPU has more to do with multithreading than arithmetic.


After that, I'm not reading in depth but it looks like an involved discussion of extended-precision arithmetic. Trust me, every ISA provides a carry bit which is the best way to do that. Don't reinvent the bitslice ;v) .
