By: Rob Thorpe (rthorpe.delete@this.realworldtech.com), November 6, 2006 8:00 am
Room: Moderated Discussions
Michael S (already5chosen@yahoo.com) on 11/5/06 wrote:
---------------------------
>Even if implementation makes aligned case of movdqu/lddqu load as fast as movdqa
>load and even if unaligned case is only 1.5-3 times slower than aligned case it's
>still no good because unaligned access still remains 2nd class citizen.
>
>Effectively with the best possible implementaion of current ISA unaligned SSE would
>reach to the level of mini-RISC (Thumb, MIPS16, Hitachi SH etc.) i.e. load-store
>architecture on top of small register file and 2-address instructions. At least
>mini-RISCs for their misery in performance department enjoy an excellent code density. SSE can't claim even that.
>
>There is only one solution to that problem that has a true x86 spirit - make unaligned
>access a first-class citizen, i.e. allow unaligned load+op instructions in the same
>unrestricted way they were allowed under x87 and MMX.
>Unfortunately, even if done right now it would affect the majority of shipping software only 5-6 years in advance.
SSE is generally troublesome as it is. Making unaligned access a first class citizen, in terms of speed and semantics, would make SSEx real first class citizen in turn.
But, in general, to justify doing so would require that doing so were fast in absolute terms. In particular that an x86 with no penalties for unrestricted alignment were faster than competitors without it, and did not take longer to design.
Any machine that chose to do this would not see the benefit from doing so until shipping software were able to make use of the change.
---------------------------
>Even if implementation makes aligned case of movdqu/lddqu load as fast as movdqa
>load and even if unaligned case is only 1.5-3 times slower than aligned case it's
>still no good because unaligned access still remains 2nd class citizen.
>
>Effectively with the best possible implementaion of current ISA unaligned SSE would
>reach to the level of mini-RISC (Thumb, MIPS16, Hitachi SH etc.) i.e. load-store
>architecture on top of small register file and 2-address instructions. At least
>mini-RISCs for their misery in performance department enjoy an excellent code density. SSE can't claim even that.
>
>There is only one solution to that problem that has a true x86 spirit - make unaligned
>access a first-class citizen, i.e. allow unaligned load+op instructions in the same
>unrestricted way they were allowed under x87 and MMX.
>Unfortunately, even if done right now it would affect the majority of shipping software only 5-6 years in advance.
SSE is generally troublesome as it is. Making unaligned access a first class citizen, in terms of speed and semantics, would make SSEx real first class citizen in turn.
But, in general, to justify doing so would require that doing so were fast in absolute terms. In particular that an x86 with no penalties for unrestricted alignment were faster than competitors without it, and did not take longer to design.
Any machine that chose to do this would not see the benefit from doing so until shipping software were able to make use of the change.