![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| UNIX for Advanced & Expert Users Advanced UNIX and Linux questions go here. Expert-to-Expert. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| logrotate bug? | jjinno | UNIX for Advanced & Expert Users | 0 | 06-09-2008 01:24 PM |
| Logrotate problems - Can anyone spot the problem please?! | anderow | UNIX for Dummies Questions & Answers | 10 | 01-10-2008 05:56 PM |
| logrotate.conf | fredao | UNIX for Advanced & Expert Users | 10 | 12-19-2006 06:57 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
cron/logrotate chicken and egg problem
I have run into a problem where about a dozen machines, all the same x86_64 2.6.12 GNU/Linux. For some reason these machines will fill up their /var partition (10G), because their logs never get rotated... Unfortunately, there is no error message from logrotate (would be in /var/log/messages) and the last time logrotate ran (according to /var/log/logrotate.log) was August 11, 2008.
This unfortunately is somewhat of a puzzling problem, making it feel (however unlikely) that this is actually a problem with cron, and not log-rotate. IE. cron failed first, which made logrotate never happen. I had originally thought that log-rotate just couldn't do its job because there was no space available for the rotation to occur. But even when I gave it enough space, cron just never ran the logrotate script. On top of that, I even added my own "append timestamp to file" script to the cron.hourly, and it never got run either. As a test, on one of the servers, I restarted cron... BINGO, it fixed the whole thing. I could care less about the fact that I can fix it. I want to prevent this from ever happening in the field. In my lab is one thing, but in the field, there goes all my logs for debugging. Here is a sanity check I ran (just proving that cron was functioning and stopped functioning - all within the same up-time): Code:
[root@babylon5 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 9.5G 3.7G 5.3G 42% / /dev/sda1 190M 18M 163M 10% /boot /dev/shm 2.0G 4.0K 2.0G 1% /dev/shm /dev/sda6 44G 8.4G 33G 21% /stuff /dev/sda5 9.5G 9.5G 0 100% /var [root@babylon5 ~]# ls -lut /etc/init.d/crond -rwxr-xr-x 1 root root 1904 Aug 11 15:14 /etc/init.d/crond [root@babylon5 ~]# ls -lut /etc/crontab -rw-r--r-- 1 root root 255 Jul 15 10:19 /etc/crontab [root@babylon5 ~]# grep "Aug 11 15" /var/log/cron Aug 11 15:01:02 babylon5 crond[9429]: (root) CMD (run-parts /etc/cron.hourly) [root@babylon5 ~]# ps auxwww|grep cr[o]n root 3014 0.0 0.0 60404 1296 ? Ss Jul15 0:00 crond [root@babylon5 ~]# uptime 13:33:34 up 57 days, 3:15, 1 user, load average: 15.63, 15.37, 15.29 [root@babylon5 ~]# grep "STARTUP" /var/log/cron* /var/log/cron.4:Jul 14 14:16:44 babylon5 crond[3156]: (CRON) STARTUP (V5.0) /var/log/cron.4:Jul 15 08:53:45 babylon5 crond[3008]: (CRON) STARTUP (V5.0) /var/log/cron.4:Jul 15 10:19:35 babylon5 crond[3014]: (CRON) STARTUP (V5.0) [root@babylon5 ~]# tail -n 1 /var/log/cron.4 Jul 20 00:01:01 babylon5 crond[15555]: (root) CMD (run-parts /etc/cron.hourly) [root@babylon5 ~]# tail -n 1 /var/log/cron.3 Jul 27 01:01:02 babylon5 crond[27941]: (root) CMD (run-parts /etc/cron.hourly) [root@babylon5 ~]# tail -n 1 /var/log/cron.2 Aug 3 01:01:01 babylon5 crond[17755]: (root) CMD (run-parts /etc/cron.hourly) [root@babylon5 ~]# tail -n 1 /var/log/cron.1 Aug 10 01:01:02 babylon5 crond[29697]: (root) CMD (run-parts /etc/cron.hourly) [root@babylon5 ~]# tail -n 1 /var/log/cron Sep 10 13:01:01 babylon5 crond[22186]: (root) MAIL (mailed 146 bytes of output but got status 0x007f ) [root@babylon5 ~]# ll /var/log/cron* -rw-r--r-- 1 root root 138502 Sep 10 13:01 /var/log/cron -rw------- 1 root root 14407 Aug 10 01:01 /var/log/cron.1 -rw------- 1 root root 14581 Aug 3 01:01 /var/log/cron.2 -rw------- 1 root root 14483 Jul 27 01:01 /var/log/cron.3 -rw------- 1 root root 14513 Jul 20 00:01 /var/log/cron.4 |
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
Can you be sure that the jobs aren't being run if you are relying on logs stored in /var to check? If /var is 100% full, it's quite possible that the jobs *are* running, but simply can't add a log entry to say so. Did your "append timestamp to file" test script use a file on another filesystem?
Having said that, I think it's quite likely that cron would stop working once /var becomes 100% full... there must be some pretty big logs being generated somewhere to fill a 9.5GB /var? |
|
#3
|
|||
|
|||
|
The "append timestamp to file" test used /tmp... which is on /dev/sda2.
If you look at the output from "ll /var/log/cron*" you will see that messages continued to be sent to /var/log/cron from after August 10. The last line of the cron.1 file confirms that August 10 was when the last logrotate happened. Given that cron logs should be relatively consistent, one would expect the next cron log rotation to occur on August 17... but it did not. Also, because of the size (~14k) of the cron log, there is a guarantee that the logrotate did not fail (on the 17th) due to the filesystem being full... instead logrotate just never ran. According to /var/log/logrotate.log, the last time that logrotate ran was: 1218492062 which translates to : August 11th, 2008 10:01:02 PM As for the logs being generated, yeah, they are big, but its normal for my app with debug on max... |
|
#4
|
|||
|
|||
|
Have you established whether cron stops running all jobs when it's in this state? Just the cron.hourly? Or just jobs for a specific user (i.e. root in this case)? Also, are there any other cron jobs stuck at the time?
Have you tried strace-ing a crond that's stuck like this? Also, does a pkill -HUP crond "wake" it up again? It might also be worth running cron with some debugging options, although they don't seem to be very well documented. The best reference (for Vixie cron) I've found so far is: Mac OS X Manual Page For cron(8) |
|
#5
|
|||
|
|||
|
I tried to use GDB but it would appear that the symbols are stripped from the crond binary, as the backtrace only provided the following:
Code:
(gdb) bt #0 0x00002aaaaae6d5a0 in ?? () #1 0x00002aaaaae6d3f4 in ?? () #2 0x0000555555557772 in ?? () from /usr/sbin/crond #3 0x0000000000000000 in ?? () As for strace, I was unaware that I could attach strace to a process that is already running... if that is so, can you reference an example? I will gladly kick it off as a test. |
|
#6
|
|||
|
|||
|
Just a silly question:
Are you sure you have inodes available on /var? ('df -i', depending on your O/S) If your inode table is full, you may have space on the disk but you can't make a new file to rotate the data into. |
|
#7
|
|||
|
|||
|
strace -aefp <pid> should do the trick.
|
|||
| Google The UNIX and Linux Forums |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|