By: Exophase (exophase.delete@this.gmail.com), May 18, 2013 10:33 am
Room: Moderated Discussions
Paul A. Clayton (paaronclayton.delete@this.gmail.com) on May 18, 2013 5:41 am wrote:
> As long as the delay slot instruction did not use the register being written by the load, dropping
> this feature would still provide binary compatibility. I do not know if such non-use was urged
> by the original MIPS documents, but I received the impression that such use was at least primarily
> found in code testing that behavior (i.e., not generated by compilers). I think it was even
> the case that such delay behavior was unreliable, that a cache miss would generate the interlock
> (I thinks such was mentioned in an comp.arch post at some point).
>
> While this "software should not" method could also be applied to using the undefined most significant bits
> of addresses for additional storage (I seem to recall that the M68k did this--told developers not to use
> but did not generate exceptions if the bits were used--and reaped the consequences), I suspect that for delayed
> load in a 31-register ISA the incentive to use the load's target register as an operand source for the delay
> slot instruction would be very small, especially if (as might have been the case) one had to assume a cache
> hit and no interrupts (that would make using the load target register in the delay slot an extremely funky
> optimization--worse than using ordinary load/store for an [interrupt] atomic RMW).
>
>
I recall a few Playstation (MIPS3k based CPU) games were found to use the load register in the delay slot and rely on the correct semantics. This is par for the course for older commercial games; you're bound to see all sorts of reliance on undefined ISA behavior if you look far enough. You're probably right that interrupts wouldn't preserve the pipeline state. On TI's C6x DSPs it's hard to write high performance code that doesn't use registers under the shadows of their execution due to having such heavily pipelining and wide execution units (and fully single cycle throughput with no interlocks). So some performance critical code will disable interrupts while in their innermost loops. But I doubt MIPS code would do this.. the cases where it happened were probably programmer mistakes that happened to work, possibly already in areas of the code where interrupts were disabled.
As far as backwards compatibility goes, self-modifying code could have been a bigger problem. That is, when going from processors with no dcache or no way to flush dcache (like Playstation's) to processors with proper dcache.
> As long as the delay slot instruction did not use the register being written by the load, dropping
> this feature would still provide binary compatibility. I do not know if such non-use was urged
> by the original MIPS documents, but I received the impression that such use was at least primarily
> found in code testing that behavior (i.e., not generated by compilers). I think it was even
> the case that such delay behavior was unreliable, that a cache miss would generate the interlock
> (I thinks such was mentioned in an comp.arch post at some point).
>
> While this "software should not" method could also be applied to using the undefined most significant bits
> of addresses for additional storage (I seem to recall that the M68k did this--told developers not to use
> but did not generate exceptions if the bits were used--and reaped the consequences), I suspect that for delayed
> load in a 31-register ISA the incentive to use the load's target register as an operand source for the delay
> slot instruction would be very small, especially if (as might have been the case) one had to assume a cache
> hit and no interrupts (that would make using the load target register in the delay slot an extremely funky
> optimization--worse than using ordinary load/store for an [interrupt] atomic RMW).
>
>
I recall a few Playstation (MIPS3k based CPU) games were found to use the load register in the delay slot and rely on the correct semantics. This is par for the course for older commercial games; you're bound to see all sorts of reliance on undefined ISA behavior if you look far enough. You're probably right that interrupts wouldn't preserve the pipeline state. On TI's C6x DSPs it's hard to write high performance code that doesn't use registers under the shadows of their execution due to having such heavily pipelining and wide execution units (and fully single cycle throughput with no interlocks). So some performance critical code will disable interrupts while in their innermost loops. But I doubt MIPS code would do this.. the cases where it happened were probably programmer mistakes that happened to work, possibly already in areas of the code where interrupts were disabled.
As far as backwards compatibility goes, self-modifying code could have been a bigger problem. That is, when going from processors with no dcache or no way to flush dcache (like Playstation's) to processors with proper dcache.