By: Stubabe (nospam.delete@this.nospam.com), November 28, 2012 4:41 pm
Room: Moderated Discussions
> No code use only SSE/AVX, the old, 8086-instructions still used (loop control,
> address calculations, etc) and the forth ALU fits nice for them.
>
True. But, for an extreme example, I have a nice fast (less than 2 cycles per byte for 12 rounds) implementation of Bernstein's ChaCha cipher in SSSE3 the ratio of SSE v GPR instructions is easily greater than 50:1 in the critical loop. Haswells 4 ALU would be no help there.
> address calculations, etc) and the forth ALU fits nice for them.
>
True. But, for an extreme example, I have a nice fast (less than 2 cycles per byte for 12 rounds) implementation of Bernstein's ChaCha cipher in SSSE3 the ratio of SSE v GPR instructions is easily greater than 50:1 in the critical loop. Haswells 4 ALU would be no help there.



