Sunday, August 4, 2013

Lesson learned: Rebooting a server

Following yesterday's post about encountering an issue with adding space via ASM, we learned a very valuable lesson with rebooting a server: don't leave your servers up for hundreds of days without a reboot.  Why do I say that?  What happens when you have a problem with your server and it needs to be rebooted, but you have had it up for 300, 400, 500, 800, 1000 days?  When you reboot, fstab in Linux needs to check the associated filesystems and when you've left your server up that long the check has more to do but also you've let possible corruption creep into your system without regular reboots and preventative system checks by letting fstab check the filesystems at regularly scheduled times.

So now you're rebooting a server that has a problem, it is taking longer than normal (which needs to be planned for as well in your change plans) and you have no confidence that the filesystem won't be corrupted when it comes back up which can exacerbate an already critical situation.

No comments:

Post a Comment