The UNIX and Linux Forums

The UNIX and Linux Forums (http://www.unix.com/index.php)
-   UNIX for Advanced & Expert Users (http://www.unix.com/unix-for-advanced-and-expert-users/)
-   -   What is the cause of file truncation? (http://www.unix.com/unix-for-advanced-and-expert-users/21955-what-cause-file-truncation.html)

venkatmyname 10-03-2005 08:21 AM

What is the cause of file truncation?
 
Hi,

I have a program that gets called from the front end of my application. Actually it creates some temporary files and uses them and deletes them at last. But sometimes, say once in 6 times, some of these temporary files are getting truncated in the middle and because of this my program is behaving irregularly. My application runs on AIX.

I am not sure -
1) whether some other process is truncating the files, or
2) My program itself is writing the files incompletely.

If I restart the same operation again, I am able to proceede correctly. This kind of trucation of files is happening only some times, say once in 6 times.

I want to monitor these temporary files from creation to the deletion - like what processes are writing to them, using them, truncating them etc.

Can you please tell me, is there a way to do this task? Or, any other better way of solving this problem is possible?

Thanks,
Venkat.

vino 10-03-2005 08:55 AM

I am not sure if this would help.

Did you try strace ? Read the man pages. It outputs all the system calls by a process. strace is usually for the whole application. In your case it would be the program.

vino

jim mcnamara 10-03-2005 10:04 AM

Make sure you call fflush() after every write to your temp files.

This sounds like a program design issue more than a problem with the filesystem.

venkatmyname 10-04-2005 04:52 AM

Quote:

Originally Posted by jim mcnamara
Make sure you call fflush() after every write to your temp files.

This sounds like a program design issue more than a problem with the filesystem.

No. I think it cannot be. Because, it is working well on other environments. It's the problem only on my system/environment. Moreover, it has fflush() after every write.

jim mcnamara 10-04-2005 11:49 AM

Are you checking return codes on ALL your file calls?

If you are working on a busy disk where apps create a lot of temp files (like /var/tmp), it is possible for write() not complete successfully because of transient disk full errors. Since this only happens once in a while, this must be the case.

Also consider defining TMPDIR to point to a filesystem with lots of free space or with low disk contention.

If you don't check return codes, the program runs merrily on, regardless of disk free space. I've seen your problem exactly as you describe it under these cricumstances.

blowtorch 10-04-2005 02:49 PM

I have observed this on one of our systems too. I tried to simulate this using the following programs:

fop.c - uses fopen and fwrite
Code:

#include<stdio.h>
#include<string.h>
#include<errno.h>

int main() {
        FILE *fp;
        char str[]="test";
        int ret;

        fp=fopen("/mount_pt/testfile","w");
        if(fp==NULL) {
                fprintf(stdout,"errno: %d",errno);
                exit(-1);
        }
        ret=fwrite(str,1,strlen(str),fp);
        fprintf(stdout,"ret of write: %d",ret);
        if(ret==0) {
                fprintf(stdout,"couldnot write! errno: %d",errno);
                exit(-1);
        }
        fclose(fp);
}

op.c - uses open and write
Code:

#include<stdio.h>
#include<fcntl.h>
#include<string.h>
#include<errno.h>

int main() {
        int fd;
        char str[]="test";
        int ret;

        fd=open("/mount_pt/testfile",O_CREAT|O_RDWR,0664);
        if(fd==-1) {
                fprintf(stdout,"errno: %d",errno);
                exit(-1);
        }
        ret=write(fd,str,strlen(str));
        fprintf(stdout,"ret of write: %d",ret);
        if(ret==-1) {
                fprintf(stdout,"couldnot write! errno: %d",errno);
                exit(-1);
        }
        close(fd);
}

I simulated a full filesystem by creating a 4MB filesystem and filling it up. Then ran the op.c and fop.c programs on this. op.c gives an error when trying to 'write'. However, fop.c goes through successfully - fwrite even returns the expected values, but all that is created is a 0 byte file.

This may have something to do with the buffering that is done when 'fwrite'ing - this causes the 'fwrite' to return success, even though 'write' fails.
But this does not really sound right.. could any one shed light on this?

Perderabo 10-04-2005 03:08 PM

blowtorch, your problem is due to buffering as you suspect. You could use setvbuf() to unbuffer. Or you could check the return code from fclose() which will detect the problem. Ideally, you check the return code from close() as well... although no one ever does. With an NFS mounted filesystem, close() could be the syscall that detects a full filesystem.


All times are GMT -4. The time now is 09:34 AM.

Linux and Unix Supported by: vBulletin
Search Engine Optimisation provided by DragonByte SEO v1.0.9 (Pro) - vBulletin Mods & Addons Copyright © 2014 DragonByte Technologies Ltd.
The UNIX and Linux Forums Content Copyright ©1993-2013. All Rights Reserved.
Forum Operations by The UNIX and Linux Forums