By: Michael S (already5chosen.delete@this.yahoo.com), November 5, 2006 4:19 pm
Room: Moderated Discussions
Even if implementation makes aligned case of movdqu/lddqu load as fast as movdqa load and even if unaligned case is only 1.5-3 times slower than aligned case it's still no good because unaligned access still remains 2nd class citizen.
Effectively with the best possible implementaion of current ISA unaligned SSE would reach to the level of mini-RISC (Thumb, MIPS16, Hitachi SH etc.) i.e. load-store architecture on top of small register file and 2-address instructions. At least mini-RISCs for their misery in performance department enjoy an excellent code density. SSE can't claim even that.
There is only one solution to that problem that has a true x86 spirit - make unaligned access a first-class citizen, i.e. allow unaligned load+op instructions in the same unrestricted way they were allowed under x87 and MMX.
Unfortunately, even if done right now it would affect the majority of shipping software only 5-6 years in advance.