By: Michael S (already5chosen.delete@this.yahoo.com), July 17, 2015 4:51 am
Room: Moderated Discussions
Jouni Osmala (josmala.delete@this.cc.hut.fi) on July 17, 2015 1:18 am wrote:
>
>
> I believe haswell made the best possible addition to speed up bounds checking. Increasing
> the branch unit from one to two. Having already compare and branch as fused operation
> means that at execution stage it doesn't take more than one additional operation.
>
Generally, on that regard, unlike many others I am from you school, i.e. the argument that hardware to speed up bound checks should better be spent to speed up not only bound checks looks to me as very valid.
BUT
Without deep thinking I can see at least two good reasons for specialized hardware/ISA support for bound checking.
1. Bound checking implemented by compare+branch pollutes dynamic branch prediction structures. The pollution is not insignificant, because these checks are big percentage of total branches. And application of dynamic branch prediction is almost perfectly useless, because bound-check branches can be predicted statically to non-taken with extremely good accuracy.
2. Bound checking can efficiently utilize TLB-like structures for on the fly expansion of short "bound selectors/tags" into complete "bound descriptors" consisting of base:limit and, possibly, access rights mask. It is known and by now agreed even by majority of RISC purists, that hardware is better at managing this sort of things.
Another reasons are more ISA specific.
For example, IA32/iAMD64 does not have an instruction for either conditional register-indirect branch&link or for conditional register-indirect call. Without one of those instructions bound checking implemented by compare+branch become even bigger source of expansion of code size. Even if one instructions mentioned above was part of the ISA, it would still costs 2 general-purpose registers for former or one general-purpose register for later, which is not insignificant when you have so few registers to start with. So special instruction like "conditional special-register-indirect call" would be still tempting on any ISA that has less than 32 GPRs.
Another instruction that can save space, but not necessarily speed is three-operand (temp=Ra-Rb; cond-codes=temp-Rc) or (temp=Ra-Rb; Rd=temp-Rc) on MIPS-like machines. Yes, it has 3 inputs, so has to be cracked on many OoO implementation, but space saving is significant. Besides, such instruction will have many applications apart of bound checking and can be relatively easily utilized by compilers.
>
>
> I believe haswell made the best possible addition to speed up bounds checking. Increasing
> the branch unit from one to two. Having already compare and branch as fused operation
> means that at execution stage it doesn't take more than one additional operation.
>
Generally, on that regard, unlike many others I am from you school, i.e. the argument that hardware to speed up bound checks should better be spent to speed up not only bound checks looks to me as very valid.
BUT
Without deep thinking I can see at least two good reasons for specialized hardware/ISA support for bound checking.
1. Bound checking implemented by compare+branch pollutes dynamic branch prediction structures. The pollution is not insignificant, because these checks are big percentage of total branches. And application of dynamic branch prediction is almost perfectly useless, because bound-check branches can be predicted statically to non-taken with extremely good accuracy.
2. Bound checking can efficiently utilize TLB-like structures for on the fly expansion of short "bound selectors/tags" into complete "bound descriptors" consisting of base:limit and, possibly, access rights mask. It is known and by now agreed even by majority of RISC purists, that hardware is better at managing this sort of things.
Another reasons are more ISA specific.
For example, IA32/iAMD64 does not have an instruction for either conditional register-indirect branch&link or for conditional register-indirect call. Without one of those instructions bound checking implemented by compare+branch become even bigger source of expansion of code size. Even if one instructions mentioned above was part of the ISA, it would still costs 2 general-purpose registers for former or one general-purpose register for later, which is not insignificant when you have so few registers to start with. So special instruction like "conditional special-register-indirect call" would be still tempting on any ISA that has less than 32 GPRs.
Another instruction that can save space, but not necessarily speed is three-operand (temp=Ra-Rb; cond-codes=temp-Rc) or (temp=Ra-Rb; Rd=temp-Rc) on MIPS-like machines. Yes, it has 3 inputs, so has to be cracked on many OoO implementation, but space saving is significant. Besides, such instruction will have many applications apart of bound checking and can be relatively easily utilized by compilers.