I have recently been assigned to help maintain some testing software on two UNIX stations. The stations are connected to instruments that supply power, signals, and take measurements from a box. Both test stations are basically the same. I was given a crash course (read a little training) a couple of weeks ago, introducing me to the proprietary software environment that test the boxes, and a little UNIX information. The UNIX stations, called homer and lisa, are stand-alone machines, that are networked together. I can type “rusers” and see both computers, and think I have been able to ftp files from one to another. The crash course came as there was a problem with the testing software. The training was supplied by another software engineer who had written and maintained the testing software until about 5 years ago, and has moved onto other projects. During the time he was training me, he was fixing the big problems, and giving me some little things to do. He was working on lisa. I was working on homer. We made copies of the testing software to make the changes. This is the main difference between stations: what copies have been made. I asked my trainer what he did before he left, and this is what he told me:
The last thing I attempted on lisa was to make a backup of the system using the tape drive. I tried a command in the "tar" utility that tells it to stop after so many blocks and it is suppose instruct you to insert another tape into the drive. When I came back the next day, the "tar" had aborted and displayed an error message. I would have thought that rebooting the tester would clear any problems that the "tar" had created. If nothing else seems to work, I would look up the "tar" commands and see if there is any command that "resets" the tar utility.
The next time we logged on and then off lisa, we started getting a message
I was given the suggestion of trying “df –k” which gave these results: (ignore the exact numbers. I took a screen shot with my phone camera, and am just trying to read the blurry results. The capacity % is correct. Other numbers are not exact, except for 0. I can tell that one.)
I also did the “df –k” on homer, and it came up very similar. There was a difference of a few bytes, and maybe changed the capacity of the s6 and s7 a percentage point or so, but nothing major. So I am confused as to what the “file system full message” is telling me. Why isn’t it on both? What am I missing?
Being corrupted by Windows, I thought, maybe if I reboot the computer, it might reset the problem. I asked the technician who does the tests how to reboot, or restart UNIX, as I didn’t know. He didn’t either. We powered down the main tester, which is done in the main part of environment, logged out of UNIX. Then I pushed the power button. I guess there is a shutdown command I should have used, correct? I turned it back on, and saw some more messages during the boot-up process.
And
Then I tried to login, and received this message:
From a little research, I see that there are different files for containing login and logout information. That is the utmp, utmpx, wtmp, and wtmpx, and a few others. According to my research, I can and should clear out the contents of these when they get to big. I was finally given a way to login, and these are the steps I went through, and what I did after I logged in.
Stop-a
Typed “boot –s”
Logged in as single user root, with the root password
Typed “df –k”
Typed “fsck –y /dev/dsk/c0t0d0s0, which fixed a problem.
<cntl> d
And tried to login again, and still received the “no utmpx entry” message. So I went through the above process again, until the <cntl> d. I stayed logged in and tried to look at the utmp, utmpx, wtmp, and wtmpx files. In the research, I saw that there are several directories to look:
I don’t understand why there are at least three places to find these files? I did a “ls –l” in each of those directories, and everything looked the same. I wonder if they are the same directory, just being pointed to, not actually different places? Any comments? Maybe I will try to see this in a little while. Anyway, umpt was around 72, utmpx was about 10x that, at around 740, wmpt was around 7100 and wmptx was 74000. I can’t remember exact sizes, I do remember that they were about this magnitude different from each other. I think that I compared this to homer, and saw very similar numbers. So I am still confused as to why lisa gives me “file system full” messages and homer doesn’t.
So if you got through reading this, I will say, “URGENT!!” and “help me!!!!!” but of course, that is why I am here.
I will summarize my questions, in case you didn’t see them.
Why do I get an message saying the file system is full on one, when according to “df –k” the file system sizes are about the same?
Why is there at least three places to find the utmp, utmpx, wtmp, and wtmpx? Are they all different? Or is it all pointing to the same place?
Is the best way to get over the message to clear the contents of all of the above files, and start new?
Is there a better way to find out the sizes of the filesystem? I see in the /dev/dsk/ directory that there are several c0t0d0s0 type files. Are these all the partitions? I can’t access them, and the df –k only shows the s0, s6 and s7, but there are others. Are those from a different setup? I really don’t understand the UNIX filesystem. Is there a good way to understand them?
I think this is all my questions for now. I appreciate any help you can give me, as I am just sitting around banging my head against a wall trying to figure this out.
Thanks,
Brian
Moderator's Comments:
Use code tags please. You will get a PM with instructions to use them.
Last edited by zaxxon; 10-19-2011 at 02:26 PM..
Reason: font, code tags
I'm sorry the font you use is too small for my poor eyes...
I will summarize a bit for I cannot read all your post (really too small for me...)
You had an issue you suspect coming from tar usage, yes?
/ is 100% full
Just here, if its a station I would suspect a big file being created in /dev where a default tape device should be..
You can only boot in single user if / is 100% full, so you are to remove unwanted files to recreate enough space for the system
I you have a /var that is full, so the same (cleanup) for the system needs place to write its logs otherwise the log files risk corruption
That will be the second part I suppose but until you have solved your / full, there is nothing else to do
I am sorry that it was so small. It didn't look that small when I first was writing it. I have increased the font size. I hope that helps.
I do think that some of the issue might be a tar problem. The root looks to be 100% full. My question comes because of confusion. My other station is almost the same, being 100% full on the root, but it doesn't have the same problems. But is my best bet just to delete the contents of the wtmp and the wtmpx, which I found to be 707464 and 7249920 respectively. Those are the big files. Is that the best way to start?
Well / can be full...but still have enough room for root user to cleanup before disaster...
e.g. normally with HFS you used to have 10% reserved ( full for all users except root...).
With vxfs full at 95% you would be unable to extend the file system for lack of space for reorg...
Now I doubt it being because of wtmp or wtmpx for you would most certainly be unable to connect if you were not root... and you would have system logs warning you about the near coming issue, all I can think of is look for a specific file especially if someone used tar to backup/archive and most common issue is someone misspelling the tape device path, I had long ago a very important server crash because of that... I was lucky finding in /dev a /dev/tape when the system had 2 devices: a spooltape device and a DAT device and the path should have been something starting by /dev/rmt/XXXX
/dev is not very big type
if its more than 60MB then the culprit is hidden somewhere there... if /var is full then some cleaning is necessary to be able to use vi for a start (on HP anyway...) all the system logs need to be able to write!
Your system look like a solaris is it so?
Last edited by vbe; 10-20-2011 at 10:31 AM..
Reason: typos... (was late last night...)
From a little research, I see that there are different files for containing login and logout information. That is the utmp, utmpx, wtmp, and wtmpx, and a few others. According to my research, I can and should clear out the contents of these when they get to big. I was finally given a way to login, and these are the steps I went through, and what I did after I logged in.
Stop-a
Typed “boot –s”
Logged in as single user root, with the root password
Typed “df –k”
Typed “fsck –y /dev/dsk/c0t0d0s0, which fixed a problem.
<cntl> d
This time you did a clean shutdown /reboot in single user mode, then (fsck) you did a file system check on the disk as a whole (I think - I've been using since 1994 only LVM...) because to speed up its access UNIX caches the file system and so cutting the current the way you did the first time let the poor thing with all its pending writes and files opened in a bad situation...
You asked about the system I am on. I was going to put it in the first post, and forgot there as well. Here it is: Sun OS 5.6, version generic [UNIX(R) System V release 4.0], August 1997.
I did the du -sk /dev, and saw that it is over 60 MB. Is there files I should stay away from cleaning? Should I also clean the contents of the utmp, utmpx, wtmp, and wtmpx? Is anything in those files really important for me to keep?
So its a solaris 6... (wow I dont know if I can find something similar...)
If you read my previous post, /dev should be quite small... so I'm sure the culprit is there.. Let's not touch anything else for the moment.
Do you have a /dev/tape file ? What size?
What do you have in /dev/rmt ?
I have the input file like this.
Input file: 12.txt
1) There are one or more than one <tr> tags in same line.
2) Some tr tags may have one <td> or more tna one <td> tags within it.
3) Few <td> tags having "<td> </td>". Few having more than one " " entry in it.
<tr> some td... (4 Replies)
Hi All,
Please help me and guide me to write a bash/shell script on Linux box to delete parent entry with all their child entries.
example:
Parent is :
----------
dn: email=yogesh.kumar@wipro.com, o=wipro, o=in
child is:
----------
dn: cn: yogesh kumar, email=yogesh.kumar@wipro.com,... (1 Reply)
1.) I am to write scripts that will be phasetest folder in the home directory.
2.) The folder should have a set-up,phase and display files
I have written a small script which i used to check for the existing users and their password.
What I need help with:
I have a set of questions in a... (19 Replies)
can anyone explains me the last fields in the below cron job. Here it will run on 31st 23:59.. what is mean by 1-7/2 (first to seventh month) what /2 represents
59 23 31 1-7/2,8-12/2
Thanks in advance. (2 Replies)
Hi,
i need to setup a cronjob that has will execute iostat command from morning to evening time.
for instance the timing has will be like this.
8:00 A.M -- 6:00P.M
how to define this entry in crontab
Regards (3 Replies)
I had a drive go bad. I have replaced the drive and have my system up. I have a 1-to-1 mirror raid. My mirrored boot drive brings the system up. I issued a reboot command. The system comes up gives me a login prompt and generates the error: failed write of utmpx entry:"i2".
What is i2? (1 Reply)
Want to make a entry for using shell script, something like this:
Name:_____________ Age:____________
address:___________
This entry form showed on the screen when the script executed, ask the user enter the field one by one ( user press <Enter>, cursor will be on the next field), anyone... (5 Replies)
When shutting down a freshly installed version of 5.7 on a Compaq server I get two messages:
INIT: failed to write of utmpx entry: "s0"
and:
INIT: failed to write of utmpx entry: "fw"
then I can continue with the shutdown. When I bring the system back up it has not saved any of my... (2 Replies)
:confused:
HI...
I am a brasilian software maker.
I have a message " utmpx - filesystem isfull - /var/messages/unable to edit data "
Like resolves its problem ???
Tank yours.... (2 Replies)
I am trying to set up a trust between two hosts and from another host to both by creating an /etc/hosts.equiv file on both. I am able to rsh to all but one host (and I am able to rsh from that host to the other). I receive the following error:
No utmpx entry. You must exec "login" from the lowest... (1 Reply)