Quote:
Originally Posted by
Scrutinizer
Also maybe:
- Try to track documentation and if possible reports of past changes and logs, if not available, see if you can interview past admins. It is really nice to be confident that systems will come back up without problems if rebooted...
- Check if startup and shutdown of applications is implemented well and if it is automatic or manual..
- Check for possible dependencies on other systems. Track ingoing and outgoing traffic..
- Also check external hardware, for example NAS / SAN Disk Arrays, Network and SAN-switches, UPS, Airco, etc...
- Acquire a test system so you can try stuff out..
These are some excellent tips. I'll be adding all of them to my to-do list. It's starting to seem that I may not get through them as quickly as I hoped. Oh well, job security, I suppose. I honestly don't see how I could skip any of the steps. They're all critical things that could take down the system.
Regarding the test system, there's a large VMware cluster. At a minimum, I'll use that to provide a test environment. Because of the size of the environment, I tend to think there's no unused servers, but I'll look for on.
I will have the chance to interview a former admin. I'll try to find out if there's a log, change log, etc... If not, I'll press for details.
Any specific questions or terms I might want to use?
---------- Post updated at 09:38 PM ---------- Previous update was at 09:30 PM ----------
Quote:
Originally Posted by
bryanNJ
Brainstorming a bit here..
Check the cron tables on each system as well, just to see what the prior admins have tried to automate (system admin related or application related)..
I'd also verify you have account access to the SC/SP/ALOM/ILOM over serial console, having this information handy will go a long way if a critical server goes down. If you don't have access, look into resetting the password.
You hit on user access, but to expand that, reset the root passwords, and check who else may have root access via sudo, powerbroker (if used), or uids.
Check the messages file on each system as well to catch any other issues that may have been written via syslog.
Oh yea, check /var/crash/<hostname> to see if/when the last time the server may have panic'd..
Great brain storm, man. There's some great security tips in here. I'll have to add all these in too. At some point, I may have to get some help in prioritizing these. It's a good problem to have, I suppose. I want my client to get their money's worth.
---------- Post updated 01-07-13 at 08:47 AM ---------- Previous update was 01-06-13 at 09:38 PM ----------
I had cross posted this on Oracle's forums and got a nice tip for taking a snapshot. Also, to review the logs associated with reboot and looking for anything unusual to return the server to it's expected state.