Unix/Linux Go Back    


War Stories Tell your work related tech stories and share experiences here. Share your successes and failures and other "war stories" in this forum.

Prize of being an Admin - Part 2

War Stories


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 04-14-2013
ahamed101's Unix or Linux Image
ahamed101 ahamed101 is offline Forum Advisor  
root is god!!!
 
Join Date: Sep 2008
Last Activity: 19 October 2016, 5:02 AM EDT
Location: San Jose, CA
Posts: 1,910
Thanks: 54
Thanked 488 Times in 481 Posts
Prize of being an Admin - Part 2

I was reading this thread of admin_xor Prize of being an Admin and thought will share this experience of mine which is kind of opposite to what he did - I didn't tell anybody what happened Linux

We were porting one of the subsystem from Solaris to Linux. As part of that we developed many wrapper scripts. So, there is this rsh wrapper script which is deployed in the system which internally uses ssh if security is enabled or uses native rsh instead (this native rsh is placed in a different path, so that it will not show up in the $PATH and the wrapper rsh script is placed in /usr/bin). For some testing purpose, I modified the ssh command inside the rsh wrapper script to "rsh" command and forgot to change it back. So, you know what happened next. If I do a rsh, it goes into a continuous loop calling the rsh wrapper script over and over again, this clogged the cpu in no time. I did this change in the Testing teams setup. And the worst part was I did it in 2 of their setup.

Next day I came to office and there is a big fuss all over the place. I didn't bother cause it was't assigned to me and I totally forgot that what I did was causing this. After couple of days, the issue was assigned to me and then "oops" I realized it. Now what? Of course I didn't tell them Linux. Hearing about what happened to admin_xor for what he did, imagine what would've happened to me.

Later I told them that it was a "human" Linux error and that there is no issue with the system. But then they asked how could it happen to 2 systems. I was like "it happened man, forget it" Linux - no I didn't say that, I told them we'll monitor it. I assured them that it is human error and we will monitor the system and if it re-occurs we will investigate again and now its not worth spending time on this - obviously I know it - cause I am the culprit.

--ahamed

Last edited by ahamed101; 04-14-2013 at 12:25 PM..
Sponsored Links
    #2  
Old Unix and Linux 04-14-2013
wisecracker's Unix or Linux Image
wisecracker wisecracker is offline
Registered User
 
Join Date: Jan 2013
Last Activity: 23 September 2017, 6:25 AM EDT
Location: Loughborough
Posts: 1,221
Thanks: 366
Thanked 323 Times in 253 Posts
'Tis a brave person who admits his/her failings to the public...

Bazza...
Sponsored Links
    #3  
Old Unix and Linux 04-14-2013
jlliagre jlliagre is offline Forum Advisor  
ɹǝsn sıɹɐlos
 
Join Date: Dec 2007
Last Activity: 23 September 2017, 7:00 PM EDT
Location: Outside Paris
Posts: 4,870
Thanks: 18
Thanked 635 Times in 554 Posts
Would be even braver to admit it internally.

I guess your employer and colleagues will sooner or later be aware of this posting as you gave plenty of information to identify the case.

Not sure they'll appreciate, especially the ones who didn't found the root cause after a couple of days ...
    #4  
Old Unix and Linux 05-01-2013
Jsmarterer Jsmarterer is offline
Registered User
 
Join Date: May 2013
Last Activity: 2 May 2013, 8:44 AM EDT
Posts: 4
Thanks: 0
Thanked 0 Times in 0 Posts
Oof, that must have been painful to own up to. Nicely done though!
Sponsored Links
    #5  
Old Unix and Linux 05-12-2013
bakunin bakunin is offline Forum Staff  
Bughunter Extraordinaire
 
Join Date: May 2005
Last Activity: 23 September 2017, 4:37 AM EDT
Location: In the leftmost byte of /dev/kmem
Posts: 5,562
Thanks: 106
Thanked 1,570 Times in 1,162 Posts
First observation: sh!t happens! That is a proven, reliable fact and an environment which can't cope with that is designed wrongly from the start. If you need a service to be not disrupted you shouldn't allow people to develop on it, because development will create the one or other hiccup to happen over time. Further, you need to take precautions against failure of every single part of the system if it should survive. Suppose instead of your error some hardware would have crashed, the network disrupted, whatever. This is what HA-solutions are for, for instance.

No SysAdmin in his right mind will let a manager determined to "save" on hardware off this hook: do you want to bet the projects future on me never doing an accidental typo? (As it is i have actually said exactly this in a design conference - and got my testing system.) And, by the way: when they decide about new office furniture for their offices any intention to save is usually abandoned immediately, so wtf?

Second aspect: whenever you do something it is your utmost responsibility to test what you have done. Immediately! So how can you create such a loop and not notice it? How can you implement this change even twice? This is not a question of introducing an error - that happens to all of us. It is a matter if noticing you have done something wrong and this has to do with the style of work: if i delete a file, i do an immediate "ls" to verify it (and it alone) is gone, if i do a "cd" i do a "pwd" to verify i am in the right directory, etc., etc.. This slows me down by perhaps 5%, but when i think i have something done i usually have it done - without any error. The 5% are easily recovered not having to do the error correction and/or recovery others eventually have to do.

So, i hope for your best, but you should really change your work ethics and learn from this accident. My 2 cents.

bakunin
The Following 2 Users Say Thank You to bakunin For This Useful Post:
Sun Fire (06-14-2013), tayyabq8 (10-12-2015)
Sponsored Links
    #6  
Old Unix and Linux 05-29-2013
hicksd8 hicksd8 is offline Forum Advisor  
Registered User
 
Join Date: Feb 2012
Last Activity: 23 September 2017, 4:57 PM EDT
Location: Devon, UK
Posts: 1,461
Thanks: 190
Thanked 376 Times in 315 Posts
I agree with everything Bakunin has said but, as a rule and as an IT pro, I never delete anything. If I'm going to edit a file, I copy it (usually with a different suffix) and if I'm going to delete a file, I rename (mv) it.

Then, when something stops working (and in IT anything that can go wrong usually does) and I need to know what the hell was in that file that I edited/deleted, I can find out.

Rule is to think hard before you edit or delete anything!!!
Sponsored Links
    #7  
Old Unix and Linux 05-29-2013
MadeInGermany MadeInGermany is offline Forum Staff  
Moderator
 
Join Date: May 2012
Last Activity: 23 September 2017, 5:20 PM EDT
Location: Simplicity
Posts: 3,748
Thanks: 306
Thanked 1,257 Times in 1,136 Posts
A list of fatal commands that really happened

Code:
rm * .tmp      # A space too many
last | reboot  # grep missing
hostname -f    # on Solaris sets hostname to -f
ifconfig -a 1  # on Solaris sets all interfaces to 0.0.0.1

Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Prize of being an Admin admin_xor War Stories 30 06-11-2012 06:12 AM
Windows Admin switching to *nix Admin bobwilson What is on Your Mind? 3 03-10-2011 06:54 AM



All times are GMT -4. The time now is 07:41 PM.