By: dc (a.delete@this.b.c), November 21, 2010 4:00 pm
Room: Moderated Discussions
rwessel (robertwessel@yahoo.com) on 11/20/10 wrote:
---------------------------
>dc (a@b.c) on 11/20/10 wrote:
>---------------------------
>>And that bring up another question: are the RAS features of Nonstop unique? Does zOS match or exceed them?
>
>
>Oh, my... such a loaded question...
>
>The two systems take very different approaches. And both can, if used with great
>care, provide astonishingly high levels of reliability.
>
>That being said, on a single system, the Nonstop approach probably provides somewhat
>better RAS, both at the hardware and (OS) software level.
>
>Hardware wise, the level of redundancy is higher on the Nonstop systems, a failing
>zSeries core needs to participate at least a bit (state needs to be read out) in
>order to allow recovery, whereas in NS the duplication allows a failing core to
>be detected and another in the node to continue running. OTOH, the fault detection
>and recovery *within* a core are undoubtedly stronger on the zSeries cores. On
>the OS side, NS is (vaguely) transaction oriented, and the philosophy permeates
>the system - a failing OS process tends to be well contained in what it was trying
>to do, and can be backed out. zOS does that to some extent too, but on a much more
>ad-hoc basis - all the important stuff is protected by "Functional Recovery Routines"
>(think exception handlers), which are supposed to back out/recover any failing function.
>zOS tends to be a lot more scattered in terms of function, with many less than clean interfaces between systems.
>
>Certainly zOS has far more historical baggage than NS.
>
>OTOH, if used well, zOS's clustering is extremely reliable.
>
>In both systems applications need to be designed to support that reliability, and
>probably NS makes that a bit easier than zOS.
Thanks for the information. That was more or less what I thought, but I'm only vaguely familiar with the RAS features of either zOS or NS.
>Of course it's somewhat moot. Both systems tend to see far more downtime from
>user/application errors than from hardware or OS failures.
Agreed. I don't work on them directly, but my employer has several clustered zOS mainframes running numerous applications. Once or twice a year one of the applications will have a problem, but in the last decade I can recall only two or three instances of a hardware or OS problem causing slowdowns, and we've yet to experience a mainframe going down.
---------------------------
>dc (a@b.c) on 11/20/10 wrote:
>---------------------------
>>And that bring up another question: are the RAS features of Nonstop unique? Does zOS match or exceed them?
>
>
>Oh, my... such a loaded question...
>
>The two systems take very different approaches. And both can, if used with great
>care, provide astonishingly high levels of reliability.
>
>That being said, on a single system, the Nonstop approach probably provides somewhat
>better RAS, both at the hardware and (OS) software level.
>
>Hardware wise, the level of redundancy is higher on the Nonstop systems, a failing
>zSeries core needs to participate at least a bit (state needs to be read out) in
>order to allow recovery, whereas in NS the duplication allows a failing core to
>be detected and another in the node to continue running. OTOH, the fault detection
>and recovery *within* a core are undoubtedly stronger on the zSeries cores. On
>the OS side, NS is (vaguely) transaction oriented, and the philosophy permeates
>the system - a failing OS process tends to be well contained in what it was trying
>to do, and can be backed out. zOS does that to some extent too, but on a much more
>ad-hoc basis - all the important stuff is protected by "Functional Recovery Routines"
>(think exception handlers), which are supposed to back out/recover any failing function.
>zOS tends to be a lot more scattered in terms of function, with many less than clean interfaces between systems.
>
>Certainly zOS has far more historical baggage than NS.
>
>OTOH, if used well, zOS's clustering is extremely reliable.
>
>In both systems applications need to be designed to support that reliability, and
>probably NS makes that a bit easier than zOS.
Thanks for the information. That was more or less what I thought, but I'm only vaguely familiar with the RAS features of either zOS or NS.
>Of course it's somewhat moot. Both systems tend to see far more downtime from
>user/application errors than from hardware or OS failures.
Agreed. I don't work on them directly, but my employer has several clustered zOS mainframes running numerous applications. Once or twice a year one of the applications will have a problem, but in the last decade I can recall only two or three instances of a hardware or OS problem causing slowdowns, and we've yet to experience a mainframe going down.