> It's not just about restoring previous state and continuing from there. Most workloads involve
> repetitive activities, like processing many more-or-less similar user requests or processing
> similar data bundles or whatever. If a piece of software crashed on one iteration then it
> is very likely going to crash after restart on the next similar activity.

Only if it is exactly the same in a specific way, either in unexpected content, or in unexpected timing (or both).

But even if it may crash next time, you want to at least try it again, right? Or would you rather have everything stop because one user request tripped it over? And surely you agree that restarting the software should be possible to begin with?

> There are, of course, cases when the software crashes every once in a while, when Moon
> is in the right position. Restarts do help to keep the system mostly running when these
> happen. However, such cases are more rare than the other, more "stable" kind of bugs.

Not in my experience. I guess it depends on how well your software is tested before deployment.

Those easy bugs are usually quickly found and quickly solved before real operation, either in internal testing before delivery, or when the customer does their own internal tests (embarrassing, but still better than after deployment).

Overall they don't take much time to solve, have a clear explanation and are not a big problem. The blue moon bugs are the hard ones and the real problem, because you need to solve them with extremely limited information and no easy way to reproduce them.

So let's rephrase to: Yes, those easy bugs are more common before deployment, but the rare bugs are more common after deployment. The opposite is only true if your customers test your software better than you do.

At my work we sell our own hardware + software, not software than can be bought online by anyone and customers doing totally unexpected things, so my experience may be atypical.
