By: rwessel (robertwessel.delete@this.yahoo.com), November 20, 2010 1:27 pm
Room: Moderated Discussions
dc (a@b.c) on 11/20/10 wrote:
---------------------------
>And that bring up another question: are the RAS features of Nonstop unique? Does zOS match or exceed them?
Oh, my... such a loaded question...
The two systems take very different approaches. And both can, if used with great care, provide astonishingly high levels of reliability.
That being said, on a single system, the Nonstop approach probably provides somewhat better RAS, both at the hardware and (OS) software level.
Hardware wise, the level of redundancy is higher on the Nonstop systems, a failing zSeries core needs to participate at least a bit (state needs to be read out) in order to allow recovery, whereas in NS the duplication allows a failing core to be detected and another in the node to continue running. OTOH, the fault detection and recovery *within* a core are undoubtedly stronger on the zSeries cores. On the OS side, NS is (vaguely) transaction oriented, and the philosophy permeates the system - a failing OS process tends to be well contained in what it was trying to do, and can be backed out. zOS does that to some extent too, but on a much more ad-hoc basis - all the important stuff is protected by "Functional Recovery Routines" (think exception handlers), which are supposed to back out/recover any failing function. zOS tends to be a lot more scattered in terms of function, with many less than clean interfaces between systems.
Certainly zOS has far more historical baggage than NS.
OTOH, if used well, zOS's clustering is extremely reliable.
In both systems applications need to be designed to support that reliability, and probably NS makes that a bit easier than zOS.
Of course it's somewhat moot. Both systems tend to see far more downtime from user/application errors than from hardware or OS failures.
---------------------------
>And that bring up another question: are the RAS features of Nonstop unique? Does zOS match or exceed them?
Oh, my... such a loaded question...
The two systems take very different approaches. And both can, if used with great care, provide astonishingly high levels of reliability.
That being said, on a single system, the Nonstop approach probably provides somewhat better RAS, both at the hardware and (OS) software level.
Hardware wise, the level of redundancy is higher on the Nonstop systems, a failing zSeries core needs to participate at least a bit (state needs to be read out) in order to allow recovery, whereas in NS the duplication allows a failing core to be detected and another in the node to continue running. OTOH, the fault detection and recovery *within* a core are undoubtedly stronger on the zSeries cores. On the OS side, NS is (vaguely) transaction oriented, and the philosophy permeates the system - a failing OS process tends to be well contained in what it was trying to do, and can be backed out. zOS does that to some extent too, but on a much more ad-hoc basis - all the important stuff is protected by "Functional Recovery Routines" (think exception handlers), which are supposed to back out/recover any failing function. zOS tends to be a lot more scattered in terms of function, with many less than clean interfaces between systems.
Certainly zOS has far more historical baggage than NS.
OTOH, if used well, zOS's clustering is extremely reliable.
In both systems applications need to be designed to support that reliability, and probably NS makes that a bit easier than zOS.
Of course it's somewhat moot. Both systems tend to see far more downtime from user/application errors than from hardware or OS failures.