By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), February 26, 2013 6:15 am
Room: Moderated Discussions
rwessel (robertwessel.delete@this.yahoo.com) on February 26, 2013 1:00 am wrote:
[snip]
> You'd need to be very careful with that. Notions of overflow vary considerably between languages, and
> even within languages. A mismatch with the hardware just makes a hash of things. Consider C: unsigned
> arithmetic is defined a modulo 2**n, you definitely don't want overflow checks trapping on any of that.
> OTOH, overflow check would be legal for signed arithmetic (although you'll break a ton of code if you
> implement such). Even with signed arithmetic, if you're doing it multi-precision (say implementing long-longs
> on a 32 bit processor), you certainly don't want the trapping behavior on the low word.
Presumably multi-precision would use a with-carry/borrow opcode. (Or in the case of MIPS or Alpha, use unsigned arithmetic [with Set-on-Less-Than to generate a carry bit].)
> The problem with having a mode bit to select that behavior is that setting the mode is pretty much
> always slow*, as changing them requires mucking with stuff in the pipeline, and the required behavior
> tends to change from line-to-line of code, so you'd have to bang on that a lot. Much better to
> have two forms of the instruction, or an easy way to check in the comparatively rare cases where
> you actually want the check. An always predicted "not taken" JO on x86, for example.
I was thinking along the lines of MIPS add (normal/signed) and addu (unsigned).
(Note, I also proposed extending such to store instructions to check for loss of information.)
> FWIW, S/360 has such a mode bit (and several others), in practice they're almost
> never changed to enable the trap (although it's not a user mode trap).
>
>
> *The mode bits are a considerable source of grumbling in IEEE FP for the same reason
With respect to pipeline issues with mode bits, such could be less problematic when instruction constants are used because the processor could use a future file in the front-end. (Branch mispredictions and exceptions might be a bit slower and/or more complex [a small number of rename registers in the front-end might be sufficient for infrequently changed values].)
IIRC, Alpha provided at least some rounding mode settings (constant values) on a per-instruction basis. I suspect that just providing an alternate mode would meet a lot of use cases (and "only" cost one bit of opcode space).
This effect is also present in system calls, which tends to annoy me. For changes to privileged mode which only touch memory addresses, this seems like an unnecessary overhead. (Privileged operations that change state controlling internal operation are more difficult to handle--perhaps in some cases even with just pipelining. Aggressive renaming/buffering and rollback is probably overkill in most situations, though being able to perform a fast process change could also be useful.) Yes, I liked Itanium's Enter Privileged Code instruction.
I also think that microkernel-like privilege isolation is nice and could be made substantially less expensive. Unfortunately, such privilege isolation is "never" used, so making it less expensive would just be wasting design and test resources and chip resources.
[snip]
> You'd need to be very careful with that. Notions of overflow vary considerably between languages, and
> even within languages. A mismatch with the hardware just makes a hash of things. Consider C: unsigned
> arithmetic is defined a modulo 2**n, you definitely don't want overflow checks trapping on any of that.
> OTOH, overflow check would be legal for signed arithmetic (although you'll break a ton of code if you
> implement such). Even with signed arithmetic, if you're doing it multi-precision (say implementing long-longs
> on a 32 bit processor), you certainly don't want the trapping behavior on the low word.
Presumably multi-precision would use a with-carry/borrow opcode. (Or in the case of MIPS or Alpha, use unsigned arithmetic [with Set-on-Less-Than to generate a carry bit].)
> The problem with having a mode bit to select that behavior is that setting the mode is pretty much
> always slow*, as changing them requires mucking with stuff in the pipeline, and the required behavior
> tends to change from line-to-line of code, so you'd have to bang on that a lot. Much better to
> have two forms of the instruction, or an easy way to check in the comparatively rare cases where
> you actually want the check. An always predicted "not taken" JO on x86, for example.
I was thinking along the lines of MIPS add (normal/signed) and addu (unsigned).
(Note, I also proposed extending such to store instructions to check for loss of information.)
> FWIW, S/360 has such a mode bit (and several others), in practice they're almost
> never changed to enable the trap (although it's not a user mode trap).
>
>
> *The mode bits are a considerable source of grumbling in IEEE FP for the same reason
With respect to pipeline issues with mode bits, such could be less problematic when instruction constants are used because the processor could use a future file in the front-end. (Branch mispredictions and exceptions might be a bit slower and/or more complex [a small number of rename registers in the front-end might be sufficient for infrequently changed values].)
IIRC, Alpha provided at least some rounding mode settings (constant values) on a per-instruction basis. I suspect that just providing an alternate mode would meet a lot of use cases (and "only" cost one bit of opcode space).
This effect is also present in system calls, which tends to annoy me. For changes to privileged mode which only touch memory addresses, this seems like an unnecessary overhead. (Privileged operations that change state controlling internal operation are more difficult to handle--perhaps in some cases even with just pipelining. Aggressive renaming/buffering and rollback is probably overkill in most situations, though being able to perform a fast process change could also be useful.) Yes, I liked Itanium's Enter Privileged Code instruction.
I also think that microkernel-like privilege isolation is nice and could be made substantially less expensive. Unfortunately, such privilege isolation is "never" used, so making it less expensive would just be wasting design and test resources and chip resources.