The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Advanced & Expert Users
Google UNIX.COM


UNIX for Advanced & Expert Users Advanced UNIX and Linux questions go here. Expert-to-Expert.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
logrotate bug? jjinno UNIX for Advanced & Expert Users 0 06-09-2008 01:24 PM
Logrotate problems - Can anyone spot the problem please?! anderow UNIX for Dummies Questions & Answers 10 01-10-2008 05:56 PM
logrotate.conf fredao UNIX for Advanced & Expert Users 10 12-19-2006 06:57 AM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #8  
Old 09-16-2008
Registered User
 

Join Date: Jul 2007
Location: Cloud 9
Posts: 70
The results are in...
I did a comparative strace between a fully functioning system and a system with cron failure. The basics are the same...

A poling loop for config changes (repeated every minute):
Code:
stat("cron", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0
stat("/etc/cron.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/etc/crontab", {st_mode=S_IFREG|0644, st_size=255, ...}) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {0x555555557772, [], SA_RESTORER|SA_RESTART, 0x2aaaaae0a460}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({60, 0}, {60, 0})             = 0
After some reads to /etc/localtime (for logging) and a message send, the first /var interaction... connecting to a socket:
Code:
[pid 24735] socket(PF_FILE, SOCK_STREAM, 0) = 3
[pid 24735] fcntl(3, F_GETFL)           = 0x2 (flags O_RDWR)
[pid 24735] fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 24735] connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 24735] close(3)                    = 0
Blah, blah, blah, some different libraries, a look at resolv.conf, and finally, the moment we have all been waiting for:
Code:
[pid 24735] socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
[pid 24735] connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.20.30.40")}, 28) = 0
[pid 24735] fcntl(3, F_GETFL)           = 0x2 (flags O_RDWR)
[pid 24735] fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 24735] poll([{fd=3, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
[pid 24735] sendto(3, "\377\276\1\0\0\1\0\0\0\0\0\0\7babylon5\7skynet\5loc"..., 39, MSG_NOSIGNAL, NULL, 0) = 39
[pid 24735] poll([{fd=3, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
[pid 24735] ioctl(3, FIONREAD, [111])   = 0
[pid 24735] recvfrom(3, "\377\276\205\203\0\1\0\0\0\1\0\0\7babylon5\7skynet\5lo"..., 1024, 0, {sa_family=AF_INET, \
sin_port=htons(53), sin_addr=inet_addr("10.20.30.40")}, [16]) = 111
[pid 24735] writev(2, [{"crond", 5}, {": ", 2}, {"relocation error", 16}, {": ", 2}, {"/lib64/libresolv.so.2", 21}, {": ", 2}, \
{"symbol __res_iclose, version GLI"..., 97}, {"", 0}, {"", 0}, {"\n", 1}], 10 <unfinished ...>
[pid 24734] <... read resumed> "crond: relocation error: /lib64/"..., 4096) = 146
[pid 24735] <... writev resumed> )      = 146
[pid 24734] uname( <unfinished ...>
[pid 24735] exit_group(127)             = ?
Process 24735 detached
It does this once more, and then exits (on a Interrupted System Call) to resume the nanosleep loop...

Now, almost everything (except the error) is identical between the good and bad strace. The return code for the last recvfrom() is 91 on the good server, and 111 on the bad... also the sendto() returns 43 on the good server, and 39 on the bad...
Reply With Quote
Forum Sponsor
  #9  
Old 09-16-2008
 

Join Date: May 2008
Location: Sydney, Australia
Posts: 920
I presume PID 24735 one of the jobs being run out of cron and 24734 is cron itself? Is the 10.20.30.40 an IP address you recognise? If you search back through strace output you should see an exec...() = 24735 which will contain the command line used to execute that process.

Anyway, the values returned by recvfrom() are just the numbers of bytes received on that socket, so they are likely to vary.

More interesting is the writev(2, ...) which looks like an error message being sent to stderr. Relocation errors sounds like incompatible or missing library issues, can you figure out where those messages are going? I refer back to my prior suggestion to run cron manually and/or with some debugging options where available to obtain more information.
Reply With Quote
Google The UNIX and Linux Forums
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 03:14 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0