"New ISA Prefix Fusion"

By: hobold (hobold.delete@this.vectorizer.org), August 18, 2020 2:29 pm
Michael S (already5chosen.delete@this.yahoo.com) on August 18, 2020 1:48 pm wrote:
> dmcq (dmcq.delete@this.fano.co.uk) on August 18, 2020 1:00 pm wrote:
> > I like that centrifuge! It'd be an interesting little design
> > challenge to make something that implements it efficiently.
> If I am not mistaken, it's the same as x86 PDEP. On Intel it is reasonably fast (lat=3, thr=1).
> People complained, including on this board, that on AMD it is slow.
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=525,3401,4155,2719,4152&othertechs=BMI2
PDEP is (half of) the inverse. PEXT is more like it.

The interesting thing about re-arranging a series of values with a stable sort based on a 1 bit key each is this: if the centrifuge is fast enough and wide enough, then it can enable SIMD conditional processing that does not have to follow both sides of the branch. Similar to a GPU programming technique called "stream compaction".
