Datacenter Crash (Server Unreachable for About 17 minutes)

02-26-2020

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Thanks for sharing your miseries....

This has been a real PITA.....

I have written some very stern lectures to S4Y, telling them what I think about not notifying customers about data center upgrades; and for doing this during the week and not during the weekend, etc.

Their sales teams have expressed similar frustration, as I was not the only customer in the data center to be outraged at these two unschedule, unannounced, outages within 24 hours.

What the "heck" were they thinking?

What they said was "we did not think there would be a problem, sorry"; but when I used to run data centers back in the old days, we approached upgrades as

Anything that can go wrong will"
Schedule for the weekends and plan well in advance and
Notify all customers who might be effected many days in advance.

I thought this was standard practice in all data centers!

Neo

View Public Profile for Neo

Visit Neo's homepage!

Find all posts by Neo

SAVECORE(8) BSD System Manager's Manual SAVECORE(8) NAME
savecore -- save a core dump of the operating system SYNOPSIS
savecore [-fvz] [-N system] [-Z level] [directory] savecore -c [-v] [-N system] savecore -n [-v] [-N system] DESCRIPTION
When the NetBSD kernel encounters a fatal error, the panic(9) routine arranges for a snapshot of the contents of physical memory to be writ- ten into a dump area, typically in the swap partition. Upon a subsequent reboot, savecore is typically run out of rc(8), before swapping is enabled, to copy the kernel and the saved memory image into directory, and enters a reboot message and information about the core dump into the system log. If a directory is not specified, then /var/crash is used. The kernel and core file can then be analyzed using various tools, including crash(8), dmesg(8), fstat(1), gdb(1), iostat(8), netstat(1), ps(1), and pstat(8), to attempt to deduce the cause of the crash. Crashes are usually the result of hardware faults or kernel bugs. If a kernel bug is suspected, a full bug report should be filed at http://www.netbsd.org/, or using send-pr(1), containing as much information as possible about the circumstances of the crash. Since crash dumps are typically very large and may contain whatever (potentially confidential) information was in memory at the time of the crash, do NOT include a copy of the crash dump file in the bug report; instead, save it somewhere in the event that a NetBSD developer wants to examine it. The options are as follows: -c Only clears the dump without saving it, so that future invocations of savecore will ignore it. -f Forces a dump to be taken even if the dump doesn't appear correct or there is insufficient disk space. -n Check whether a dump is present without taking further action. The command exits with zero status if a dump is present, or with non-zero status otherwise. -N Use system as the kernel instead of the default (returned by getbootfile(3)). Note that getbootfile(3) uses secure_path(3) to check that kernel file is ``secure'' and will default to /netbsd if the check fails. -v Prints out some additional debugging information. -z Compresses the core dump and kernel (see gzip(1)). -Z level Set the compression level for -z to level. Defaults to 1 (the fastest compression mode). Refer to gzip(1) for more information regarding the compression level. savecore checks the core dump in various ways to make sure that it is current and that it corresponds to the currently running system. If it passes these checks, it saves the core image in directory/netbsd.#.core and the system in directory/netbsd.# (or in directory/netbsd.#.core.gz and directory/netbsd.#.gz, respectively, if the -z option is used). The ``#'' is the number from the first line of the file directory/bounds, and it is incremented and stored back into the file each time savecore successfully runs. savecore also checks the available disk space before attempting to make the copies. If there is insufficient disk space in the file system containing directory, or if the file directory/minfree exists and the number of free kilobytes (for non-superusers) in the file system after the copies were made would be less than the number in the first line of this file, the copies are not attempted. If savecore successfully copies the kernel and the core dump, the core dump is cleared so that future invocations of savecore will ignore it. SEE ALSO
fstat(1), gdb(1), gzip(1), netstat(1), ps(1), send-pr(1), crash(8), dmesg(8), iostat(8), pstat(8), rc(8), syslogd(8), panic(9) HISTORY
The savecore command appeared in 4.1BSD. BUGS
The minfree code does not consider the effect of compression. BSD
September 13, 2011 BSD

Cybersecurity

Datacenter Crash (Server Unreachable for About 17 minutes)

10 More Discussions You Might Find Interesting

1. AIX

System p 9115-505: Server and HMC unreachable

Discussion started by: mediaset23

2. Red Hat

how to configure netdump to copy the crash in the server itself??

Discussion started by: pabloli150

3. Programming

Client accidently close when the server crash

Discussion started by: sehang

4. SCO

Crash error on my unix server

Discussion started by: danilosevilla

5. Programming

Client/Server Socket Application - Preventing Client from quitting on server crash

Discussion started by: varun.nagpaal

6. Linux

crash dump server for red hat ent 4

Discussion started by: itik

7. UNIX for Dummies Questions & Answers

Notification if server unreachable?

Discussion started by: Sepia

8. UNIX for Advanced & Expert Users

Solaris Server Crash

Discussion started by: Breen

9. UNIX for Dummies Questions & Answers

server crash

Discussion started by: knarayan

10. UNIX for Advanced & Expert Users

linux server crash

Discussion started by: Abhishek

LEARN ABOUT NETBSD

savecore