By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), May 18, 2013 3:48 pm
Room: Moderated Discussions
Exophase (exophase.delete@this.gmail.com) on May 18, 2013 10:33 am wrote:
[snip]
> I recall a few Playstation (MIPS3k based CPU) games were found to use the load register in the delay
> slot and rely on the correct semantics. This is par for the course for older commercial games; you're
> bound to see all sorts of reliance on undefined ISA behavior if you look far enough.
I admit that I do not find that especially surprising. I find it sad that such things need to be checked at run-time when they could be checked at the time of compilation (or assembly), but realistically the user would blame the hardware vendor ("it worked on the previous implementation") and the hardware vendor might not have any means of requiring even minimal software validation.
(Such is a little surprising in that MIPS is relatively register rich, so the benefit of having such a temporary storage would seem to be very small, though perhaps some such cases might allow the avoidance of a register-register move?)
(I admit in some of my [very limited] playing with CSS, I have used browser behavior to try to produce a given appearance rather than trying to fully understand the standard. [The fact that CSS was not implemented consistently was not helpful. To some degree it did not matter if a bug was in my use of CSS or in a browser's implementation of CSS; I would usually just give up on trying to generate a particular appearance. {I did report one minor CSS bug in Opera, but that was partially because it was obviously a bug and partially because the barrier to reporting was very low--and there was some feeling that reporting a bug might lead to the bug being fixed.}])
> You're probably
> right that interrupts wouldn't preserve the pipeline state.
It might have been worse; I think even cache misses could break the load delay behavior.
In either case, I would assume MIPS would have stated that such behavior was undefined and should not be relied upon (though for interrupt-only "delay quashing", non-interruptible code sequences could be considered reliable).
Does anyone here know if cache misses filled the load destination before the value would be read by a delay slot instruction? (I could not find the reference I vaguely recall reading on comp.arch, but I did not search intently and my search skills are not great.)
> On TI's C6x DSPs it's hard to write high
> performance code that doesn't use registers under the shadows of their execution due to having such
> heavily pipelining and wide execution units (and fully single cycle throughput with no interlocks).
> So some performance critical code will disable interrupts while in their innermost loops.
For a VLIW DSP such behavior might be more understandable. Classic VLIWs took the idea of pipeline scheduling to the extreme, and for DSPs hardware performance efficiency was probably substantially more important than ease of writing software and other concerns.
> But I doubt
> MIPS code would do this.. the cases where it happened were probably programmer mistakes that happened
> to work, possibly already in areas of the code where interrupts were disabled.
And (if loads did squash the delay) where cache misses did not happen.
> As far as backwards compatibility goes, self-modifying code could have been
> a bigger problem. That is, when going from processors with no dcache or no
> way to flush dcache (like Playstation's) to processors with proper dcache.
I vaguely recall reading that ARM had such issues moving from no cache to a unified cache and then to separate instruction and data caches. Having multiple processors can make this issue even more interesting.
[snip]
> I recall a few Playstation (MIPS3k based CPU) games were found to use the load register in the delay
> slot and rely on the correct semantics. This is par for the course for older commercial games; you're
> bound to see all sorts of reliance on undefined ISA behavior if you look far enough.
I admit that I do not find that especially surprising. I find it sad that such things need to be checked at run-time when they could be checked at the time of compilation (or assembly), but realistically the user would blame the hardware vendor ("it worked on the previous implementation") and the hardware vendor might not have any means of requiring even minimal software validation.
(Such is a little surprising in that MIPS is relatively register rich, so the benefit of having such a temporary storage would seem to be very small, though perhaps some such cases might allow the avoidance of a register-register move?)
(I admit in some of my [very limited] playing with CSS, I have used browser behavior to try to produce a given appearance rather than trying to fully understand the standard. [The fact that CSS was not implemented consistently was not helpful. To some degree it did not matter if a bug was in my use of CSS or in a browser's implementation of CSS; I would usually just give up on trying to generate a particular appearance. {I did report one minor CSS bug in Opera, but that was partially because it was obviously a bug and partially because the barrier to reporting was very low--and there was some feeling that reporting a bug might lead to the bug being fixed.}])
> You're probably
> right that interrupts wouldn't preserve the pipeline state.
It might have been worse; I think even cache misses could break the load delay behavior.
In either case, I would assume MIPS would have stated that such behavior was undefined and should not be relied upon (though for interrupt-only "delay quashing", non-interruptible code sequences could be considered reliable).
Does anyone here know if cache misses filled the load destination before the value would be read by a delay slot instruction? (I could not find the reference I vaguely recall reading on comp.arch, but I did not search intently and my search skills are not great.)
> On TI's C6x DSPs it's hard to write high
> performance code that doesn't use registers under the shadows of their execution due to having such
> heavily pipelining and wide execution units (and fully single cycle throughput with no interlocks).
> So some performance critical code will disable interrupts while in their innermost loops.
For a VLIW DSP such behavior might be more understandable. Classic VLIWs took the idea of pipeline scheduling to the extreme, and for DSPs hardware performance efficiency was probably substantially more important than ease of writing software and other concerns.
> But I doubt
> MIPS code would do this.. the cases where it happened were probably programmer mistakes that happened
> to work, possibly already in areas of the code where interrupts were disabled.
And (if loads did squash the delay) where cache misses did not happen.
> As far as backwards compatibility goes, self-modifying code could have been
> a bigger problem. That is, when going from processors with no dcache or no
> way to flush dcache (like Playstation's) to processors with proper dcache.
I vaguely recall reading that ARM had such issues moving from no cache to a unified cache and then to separate instruction and data caches. Having multiple processors can make this issue even more interesting.