By: Jörn Engel (joern.delete@this.purestorage.com), January 13, 2021 12:06 pm
Room: Moderated Discussions
hobold (hobold.delete@this.vectorizer.org) on January 12, 2021 2:51 pm wrote:
> none (none.delete@this.none.com) on January 12, 2021 3:53 am wrote:
>
> [...]
> > As long as you don't need a 128-bit result, yes. I wonder what workload beyond bignum
> > requires 2 integer multiplication per 8 instructions.
>
> If you are not limited by power / heat, integer multiply can be a fast
> way to do all kinds of bit permutation trickery. Some examples here:
>
> Bit Twiddling Hacks
Most of those things would be one-offs scattered through the codebase. Hashing would be an example that can dominate performance. Many fast hash functions are murmur-based, so something like:
acc ^= input;
acc *= const;
acc >>>= 32;
acc *= const;
acc >>>= 32;
Plus various degrees of loop unrolling and using multiple accumulators in parallel. That gives you 2 integer multiplications per 5 instructions. I think these hash functions are common enough to justify a second multiplication unit.
> none (none.delete@this.none.com) on January 12, 2021 3:53 am wrote:
>
> [...]
> > As long as you don't need a 128-bit result, yes. I wonder what workload beyond bignum
> > requires 2 integer multiplication per 8 instructions.
>
> If you are not limited by power / heat, integer multiply can be a fast
> way to do all kinds of bit permutation trickery. Some examples here:
>
> Bit Twiddling Hacks
Most of those things would be one-offs scattered through the codebase. Hashing would be an example that can dominate performance. Many fast hash functions are murmur-based, so something like:
acc ^= input;
acc *= const;
acc >>>= 32;
acc *= const;
acc >>>= 32;
Plus various degrees of loop unrolling and using multiple accumulators in parallel. That gives you 2 integer multiplications per 5 instructions. I think these hash functions are common enough to justify a second multiplication unit.