The results are in...
I did a comparative strace between a fully functioning system and a system with cron failure. The basics are the same...
A poling loop for config changes (repeated every minute):
Code:
stat("cron", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat("/etc/cron.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/etc/crontab", {st_mode=S_IFREG|0644, st_size=255, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x555555557772, [], SA_RESTORER|SA_RESTART, 0x2aaaaae0a460}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({60, 0}, {60, 0}) = 0
After some reads to /etc/localtime (for logging) and a message send, the first /var interaction... connecting to a socket:
Code:
[pid 24735] socket(PF_FILE, SOCK_STREAM, 0) = 3
[pid 24735] fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
[pid 24735] fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 24735] connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 24735] close(3) = 0
Blah, blah, blah, some different libraries, a look at resolv.conf, and finally, the moment we have all been waiting for:
Code:
[pid 24735] socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
[pid 24735] connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.20.30.40")}, 28) = 0
[pid 24735] fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
[pid 24735] fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 24735] poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
[pid 24735] sendto(3, "\377\276\1\0\0\1\0\0\0\0\0\0\7babylon5\7skynet\5loc"..., 39, MSG_NOSIGNAL, NULL, 0) = 39
[pid 24735] poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
[pid 24735] ioctl(3, FIONREAD, [111]) = 0
[pid 24735] recvfrom(3, "\377\276\205\203\0\1\0\0\0\1\0\0\7babylon5\7skynet\5lo"..., 1024, 0, {sa_family=AF_INET, \
sin_port=htons(53), sin_addr=inet_addr("10.20.30.40")}, [16]) = 111
[pid 24735] writev(2, [{"crond", 5}, {": ", 2}, {"relocation error", 16}, {": ", 2}, {"/lib64/libresolv.so.2", 21}, {": ", 2}, \
{"symbol __res_iclose, version GLI"..., 97}, {"", 0}, {"", 0}, {"\n", 1}], 10 <unfinished ...>
[pid 24734] <... read resumed> "crond: relocation error: /lib64/"..., 4096) = 146
[pid 24735] <... writev resumed> ) = 146
[pid 24734] uname( <unfinished ...>
[pid 24735] exit_group(127) = ?
Process 24735 detached
It does this once more, and then exits (on a Interrupted System Call) to resume the nanosleep loop...
Now, almost everything (except the error) is identical between the good and bad strace. The return code for the last recvfrom() is 91 on the good server, and 111 on the bad... also the sendto() returns 43 on the good server, and 39 on the bad...