I always assumed the fopen/freopen is very costly, so when I needed to work with many files within on process I spent extra time to implement a list of FILE * pointers to avoid extra open/reopen but it did not produced any better results.
Here is a task at hand - there is a huge stream of data coming through stdin, each line is preceded with id and I need to place that line into its own file named id.log. The ids are coming not very random, but somewhat grouped.
Original code is very straightforward: read the line, get the ID, form the file name, do fopen/puts/fclose, loop to the next line. I thought the fopen/fclose is a bottleneck.
So, I built an array of {ID / FILE *ptr / counter} to keep last N opened files, should the next ID happens to be in the list I would just re-use the opened stream. Otherwise I either fopen stream for new entry into array, or when array has no more empty slots I would freopen the one that has the biggest number of writes. But the results are very close to the original simple approach.
My new code
I had some counters printed out just to confirm the whole scheme is working, it confirmed there are around 10% - 20% of reusing already opened file stream, so no fopen/freopen needed. But if measured by time the new code is not more than %5 faster. Is there any explanation?
One problem is - you are changing disk metadata everytime you close a file.
ie., each close may require direct I/O to the disk to update data then direct I/O to update file metadata - the stuff you see with stat: mtime, atime, #bytes in file. Disk I/O is usually several orders of magnitude slower than memory..
So -
1.consider using two large blocks of shared memory - have your process write directly to memory.
2. When your block is nearly full create several threads, one for each filename you need, to do file writes and cleanup the memory block. One file open/close per thread.
3. While the worker threads are busy have the main process write to the second memory block.
You're also doing a linear search, which can be costly if you have a huge number of open streams, a hash table may be much faster. If you have a small number it may be overkill.
It may also be that most of your work is already I/O bound, hence little gain is to be had from optimizing the code. Given the ludicrous speed of modern computers relative to modern disks I suspect this is the case.
You could also consider aio. It does not get rid of I/O waits it just stops your process from having to sit twiddling its thumbs waiting for I/O to complete.
Hello,
I have a problem here, I want to write a function called"myfopen()" instead of "fopen()"
for writing this function I must not use the <stdio.h> library,
Can you help me?
thanks a lot (2 Replies)
Hey Guys and girls,can anybody with a experience in java since i am pretty new in it, tell me why a java or java enabled web program is eating up so much system resources like CPU,Ram......ect and how to go by finding what is causing the overhead.;) Thanks a mill (3 Replies)
Hi,
I found a very strange thing when I linked my executable with a shared library. That is the executable only references a small function of the shared library, and the size of this function is only hundred bytes, but when I check the /proc/pid/smaps, I found that the 'Rss' of this shared... (8 Replies)
This code works fine when I use a command line argument for fopen()'s parameter, but when I change it to a filename, the program freezes upon compilation. input.txt is definitely there, so I can't figure it out. Thanks.
#include <stdlib.h>
#include <stdio.h>
#include <ctype.h>
int... (3 Replies)
what is the difference between
fopen and open
fread and read
fwrite and write
open and create
why this much of functions for the i/o when everything does the same...?
What is their major difference?
In which case, which is the best to use.
:confused:'ed Collins (2 Replies)
Hello!
I'm having problems with fopen() call in Linux.
I have shared library (created by myself) that implements some file operations:
int lib_func(char* file_name) {
...
fd = fopen(file_name, "r");
if(!fd) {... exit with error ...}
...
do something useful using fd
...
... (2 Replies)
Hi everyone,
I am currently trying to work out the size overhead in the library archive.
The total size of all my objects file is about 100KB. However, when I package them into the archive (libXX.a), the size gets boosted up to 200KB. I want to know what exact is that 100KB overhead. I tried... (1 Reply)
Dear all,
I'm a newbie for PHP and TCPDF ,I have to change the URL to PDF, so I used FPDF , But it cannot convert most of the advanced HTML tags. So explored again and found TCPDF , it can do most of the tag but I cannot found to change URL to PDF. So Does anyone can point the example... (0 Replies)
hello..
i make some code with C in freebsd 5.4 and compile it in solaris
somehow i succeed compile the program.
but when i run it, i got error message "Broken Pipe"
i looked out the syntax that that caused this,
fp = fopen("file.tmp","r");
does anyone know why, and how to solve this... (3 Replies)
hi , I came acroos two functions fdopen() and freopen().
what is the difference between these two functions and where can they be used. Is it that fdopen() is used to write freopen().
Advance Thanks for your co-operation.
:) (1 Reply)