overhead of fopen/freopen


 
Thread Tools Search this Thread
Top Forums Programming overhead of fopen/freopen
# 1  
Old 03-26-2010
overhead of fopen/freopen

I always assumed the fopen/freopen is very costly, so when I needed to work with many files within on process I spent extra time to implement a list of FILE * pointers to avoid extra open/reopen but it did not produced any better results.

Here is a task at hand - there is a huge stream of data coming through stdin, each line is preceded with id and I need to place that line into its own file named id.log. The ids are coming not very random, but somewhat grouped.

Original code is very straightforward: read the line, get the ID, form the file name, do fopen/puts/fclose, loop to the next line. I thought the fopen/fclose is a bottleneck.

So, I built an array of {ID / FILE *ptr / counter} to keep last N opened files, should the next ID happens to be in the list I would just re-use the opened stream. Otherwise I either fopen stream for new entry into array, or when array has no more empty slots I would freopen the one that has the biggest number of writes. But the results are very close to the original simple approach.

My new code
Code:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <ctype.h>
#include <errno.h>
#include <sys/time.h>
#include <sys/resource.h>
 
typedef struct  {
        int     mid;
        int     cnt;
        FILE    *fp;
} MFP;
static  MFP     *mfp = NULL;
static  int     mfpcnt = 0;
 
main(int argc, char *argv[])
{
  int   mid, n, found, maxcnt, empty, curr;
  char  buf[1024], fullname[256];
  struct rlimit rl;
 
        if(getrlimit(RLIMIT_NOFILE, &rl) == 0)
                mfpcnt = rl.rlim_cur - 8; /* leave some for other streams */
        else
                mfpcnt = 16; /* arbitrary default */
        mfp = (MFP *)malloc(sizeof(MFP) * mfpcnt);
        memset(mfp, 0, sizeof(MFP) * mfpcnt);
 
        while(fgets(buf, sizeof(buf) - 1, stdin))
        {
                mid = atoi(buf);
                sprintf(fullname, "%04i.log", mid);
                maxcnt = 0;
                empty = -1;
                found = -1;
                for(n = 0; n < mfpcnt; n++)
                {
                        if(mfp[n].mid == mid)
                        {
                                found = n;
                                break;
                        }
                        if(mfp[n].cnt > mfp[maxcnt].cnt)
                        {
                                maxcnt = n;
                        }
                        if(mfp[n].cnt == 0 && empty == -1)
                        {
                                empty = n;
                        }
                }
                if(found != -1)
                {
                        curr = found;
                }
                else
                {
                        if(empty != -1)
                        {
                                curr = empty;
                                mfp[curr].fp = fopen(fullname, "a");
                        }
                        else
                        {
                                curr = maxcnt;
                                mfp[curr].cnt = 0;
                                mfp[curr].fp = freopen(fullname, "a",
                                                mfp[curr].fp);
                        }
                }
                fputs(buf, mfp[curr].fp);
                mfp[curr].cnt++;
        }
        return(0);
}

I had some counters printed out just to confirm the whole scheme is working, it confirmed there are around 10% - 20% of reusing already opened file stream, so no fopen/freopen needed. But if measured by time the new code is not more than %5 faster. Is there any explanation?
# 2  
Old 03-28-2010
One problem is - you are changing disk metadata everytime you close a file.
ie., each close may require direct I/O to the disk to update data then direct I/O to update file metadata - the stuff you see with stat: mtime, atime, #bytes in file. Disk I/O is usually several orders of magnitude slower than memory..

So -
1.consider using two large blocks of shared memory - have your process write directly to memory.

2. When your block is nearly full create several threads, one for each filename you need, to do file writes and cleanup the memory block. One file open/close per thread.

3. While the worker threads are busy have the main process write to the second memory block.
# 3  
Old 03-30-2010
You're also doing a linear search, which can be costly if you have a huge number of open streams, a hash table may be much faster. If you have a small number it may be overkill.

It may also be that most of your work is already I/O bound, hence little gain is to be had from optimizing the code. Given the ludicrous speed of modern computers relative to modern disks I suspect this is the case.
# 4  
Old 03-30-2010
You could also consider aio. It does not get rid of I/O waits it just stops your process from having to sit twiddling its thumbs waiting for I/O to complete.
# 5  
Old 03-30-2010
Thanks for your suggestions, people, great ideas!
I do not have real threads on this system, I will give aio a try.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

help plz - fopen()

Hello, I have a problem here, I want to write a function called"myfopen()" instead of "fopen()" for writing this function I must not use the <stdio.h> library, Can you help me? thanks a lot (2 Replies)
Discussion started by: hamed.samie
2 Replies

2. Web Development

Java overhead

Hey Guys and girls,can anybody with a experience in java since i am pretty new in it, tell me why a java or java enabled web program is eating up so much system resources like CPU,Ram......ect and how to go by finding what is causing the overhead.;) Thanks a mill (3 Replies)
Discussion started by: techcreeb
3 Replies

3. UNIX for Dummies Questions & Answers

Overhead of using a shared library

Hi, I found a very strange thing when I linked my executable with a shared library. That is the executable only references a small function of the shared library, and the size of this function is only hundred bytes, but when I check the /proc/pid/smaps, I found that the 'Rss' of this shared... (8 Replies)
Discussion started by: Dongping84
8 Replies

4. Programming

fopen() - don't know what I'm doing wrong

This code works fine when I use a command line argument for fopen()'s parameter, but when I change it to a filename, the program freezes upon compilation. input.txt is definitely there, so I can't figure it out. Thanks. #include <stdlib.h> #include <stdio.h> #include <ctype.h> int... (3 Replies)
Discussion started by: lazypeterson
3 Replies

5. Programming

fopen and open

what is the difference between fopen and open fread and read fwrite and write open and create why this much of functions for the i/o when everything does the same...? What is their major difference? In which case, which is the best to use. :confused:'ed Collins (2 Replies)
Discussion started by: collins
2 Replies

6. UNIX for Advanced & Expert Users

Linux fopen() mistery. Help required.

Hello! I'm having problems with fopen() call in Linux. I have shared library (created by myself) that implements some file operations: int lib_func(char* file_name) { ... fd = fopen(file_name, "r"); if(!fd) {... exit with error ...} ... do something useful using fd ... ... (2 Replies)
Discussion started by: kalbi
2 Replies

7. UNIX for Advanced & Expert Users

overhead in the archive

Hi everyone, I am currently trying to work out the size overhead in the library archive. The total size of all my objects file is about 100KB. However, when I package them into the archive (libXX.a), the size gets boosted up to 200KB. I want to know what exact is that 100KB overhead. I tried... (1 Reply)
Discussion started by: jasoncrab
1 Replies

8. Web Development

CAN TCPDF USE fopen() or Convert URL To PDF?

Dear all, I'm a newbie for PHP and TCPDF ,I have to change the URL to PDF, so I used FPDF , But it cannot convert most of the advanced HTML tags. So explored again and found TCPDF , it can do most of the tag but I cannot found to change URL to PDF. So Does anyone can point the example... (0 Replies)
Discussion started by: athae
0 Replies

9. Programming

.cc fopen failed - Broken Pipe

hello.. i make some code with C in freebsd 5.4 and compile it in solaris somehow i succeed compile the program. but when i run it, i got error message "Broken Pipe" i looked out the syntax that that caused this, fp = fopen("file.tmp","r"); does anyone know why, and how to solve this... (3 Replies)
Discussion started by: kuampang
3 Replies

10. Programming

difference between fdopen() and freopen()

hi , I came acroos two functions fdopen() and freopen(). what is the difference between these two functions and where can they be used. Is it that fdopen() is used to write freopen(). Advance Thanks for your co-operation. :) (1 Reply)
Discussion started by: kinnaree
1 Replies
Login or Register to Ask a Question