File access:


 
Thread Tools Search this Thread
Top Forums Programming File access:
# 1  
Old 09-18-2008
File access:

I am tryin to evaluate ideal buffer size for file reading? Any input?Ideas?
# 2  
Old 09-18-2008
Two words: "Depends." "Measure."
# 3  
Old 09-18-2008
'Advanced Programming in UNIX' by Stevens and Rago

2nd Ed has a chart on p 70. For standard I/O a buffersize of 4096 - 8192 was the buffersize that resulted in the best performance. Really large buffers showed decreased performance, really small buffersizes resulted in very poor performance. The default on most systems is 4096. The only good reasons I know to tinker with buffersize:
a machine with very limited resources and lots of files open at the same time
because of device i/o requirements

stdio ( fopen) buffers are allocated in heap by malloc. fclose frees the buffer.

Last edited by jim mcnamara; 09-18-2008 at 10:57 AM..
# 4  
Old 09-18-2008
BUFSIZ as defined in the /usr/include/stdio.h header file is a symbolic constant provided by the standard C library for just that purpose. So you could use it as the buffer size in your source code that is...
Code:
#include <stdio.h>
char buffer[BUFSIZ]

# 5  
Old 09-18-2008
For grins:
This file:
Code:
csadev:/home/jmcnama> ll test.dat
-rwxrwxrwx   1 jmcnama    prog       164665116 Sep 18 10:36 test.dat

This shell script
Code:
 
#!/bin/ksh
let i=2
while [[ $i -lt 1280000 ]]
do
    setbuf $i
    i=$(( i * 2 ))
done

This code setbuf.c
Code:
#include <stdio.h>
#include <stdlib.h>
#include <sys/times.h>

clock_t get_tms(struct tms *dest)
{
	clock_t whatever=times(dest);
	return whatever;	
}

char *fmt(char *dest, struct tms *p, clock_t elapsed)
{
	int sdecimal=p->tms_stime % CLK_TCK;
	int ssec=p->tms_stime / CLK_TCK;	
	int udecimal=p->tms_utime % CLK_TCK;
	int usec=p->tms_utime / CLK_TCK;
	int edecimal=elapsed % CLK_TCK;
	int esec= elapsed / CLK_TCK;	
	sprintf(dest, "Elapsed:%3d.%02d User:%3d.%02d System:%3d.%02d",
	           esec, edecimal, usec, udecimal, ssec, sdecimal);
	return dest;           
}

void disp(int bufsiz, int cnt, struct tms *start, struct tms *end, clock_t elapsed)
{
	struct tms mytms={0,0,0,0};
	char tmp[256]={0x0};
	
	mytms.tms_utime=end->tms_utime - start->tms_utime;
	mytms.tms_stime=end->tms_stime - start->tms_stime;
	printf("Buffer: %6d Reads: %10d  ", bufsiz, cnt);
	printf("%s\n", fmt(tmp, &mytms, elapsed ));
}


int main(int argc, char **argv)
{
	int bufsz=atoi(argv[1]);
	char *buf=NULL;
	char *tmp=NULL;
	clock_t elapsed=0;
	int cnt=0;
	FILE *in=fopen("test.dat","r");
	struct tms start={0,0,0,0};
	struct tms end={0,0,0,0};
	buf=malloc(bufsz+1);
	tmp=malloc(bufsz+2);
	setvbuf(in, buf, _IOFBF, bufsz);
	elapsed=get_tms(&start);
	while(fread(tmp, bufsz, 1, in)==1) 
		cnt++;
	elapsed=get_tms(&end) - elapsed;
	disp(bufsz, cnt, &start, &end, elapsed);
	fclose(in);
	free(buf);
	free(tmp);
	return 0;	
}

Created this output - I added BUFSIZ for comparison
Code:
csadev:/home/jmcnama> setbuf.sh                                           
Buffer:      2  Reads:   82332558  Elapsed:322.56 User: 54.99 System:258.68
Buffer:      4  Reads:   41166279  Elapsed:161.76 User: 27.83 System:129.01
Buffer:      8  Reads:   20583139  Elapsed: 81.33 User: 14.08 System: 65.79
Buffer:     16  Reads:   10291569  Elapsed: 41.06 User:  7.01 System: 33.34
Buffer:     32  Reads:    5145784  Elapsed: 20.54 User:  3.53 System: 16.70
Buffer:     64  Reads:    2572892  Elapsed: 10.52 User:  1.79 System:  8.50
Buffer:    128  Reads:    1286446  Elapsed:  5.69 User:  0.92 System:  4.44
Buffer:    256  Reads:     643223  Elapsed:  2.94 User:  0.49 System:  2.43
Buffer:    512  Reads:     321611  Elapsed:  1.64 User:  0.27 System:  1.37
Buffer:   1024  Reads:     160805  Elapsed:  0.96 User:  0.16 System:  0.79 *
Buffer:   2048  Reads:      80402  Elapsed:  0.62 User:  0.10 System:  0.52
Buffer:   4096  Reads:      40201  Elapsed:  0.38 User:  0.07 System:  0.31
Buffer:   8192  Reads:      20100  Elapsed:  0.30 User:  0.06 System:  0.24
Buffer:  16384  Reads:      10050  Elapsed:  0.27 User:  0.05 System:  0.22
Buffer:  32768  Reads:       5025  Elapsed:  0.27 User:  0.05 System:  0.22    
Buffer:  65536  Reads:       2512  Elapsed:  0.25 User:  0.05 System:  0.21
Buffer: 131072  Reads:       1256  Elapsed:  0.26 User:  0.06 System:  0.20
Buffer: 262144  Reads:        628  Elapsed:  4.21 User:  0.37 System:  0.08
Buffer: 524288  Reads:        314  Elapsed:  3.79 User:  0.42 System:  0.08
Buffer: 1048576 Reads:        157  Elapsed:  3.24 User:  0.46 System:  0.07

*BUFSIZ

]
# 6  
Old 09-18-2008
I would think that cleaning cashe between runs of setbuf in your while ... done loop is important, as repeatative reads most likely would resuse cashe pages. I would insert a line , such as

wc -c some_big_files*here > /dev/null

after the setbuf line, so the cashe is overwritten and setbuf program will actually read the file from the disk, not from cashe.
# 7  
Old 09-18-2008
That is exactly why the answer(s) we gave to the original question were somewhat meaningless.

What you are seeing is only the effect of buffer size on the number of times underlying stdio system calls are made. More calls more overhead. Nothing to do necessarily with I/O throughput. Larger buffers do improve performance but there comes a point where doubling buffer size buys almost nothing.

The actual I/O throughput is a function of "hard drive" cache size - our drives are on a giant SAN with RAID support. Buffering is immense. So our "hard drives" are really just a frontend box that pretends it is a disk - fronting for a RAID cluster.

Clearing the cache just means you are measuring disk latency and seek times as well as the other components of I/O.

Last edited by jim mcnamara; 09-18-2008 at 05:44 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to access multiple file and operate on single file

We have three files as mentioned below: 1. main_file.txt: This is the file in which all operations will be done. Which means this file will be signed by using the below two files 2. signature_file.txt: This is a status file and contains two signatures. 3. command.txt:file contains two commands... (2 Replies)
Discussion started by: chetanojha
2 Replies

2. Proxy Server

How to use Squid on Linux to control certain IP to access Web Server and certain IP cannot access?

Dear all experts here, :) I would like to install a proxy server on Linux server to perform solely to control the access of Web server. In this case, some of my vendor asked me to try Squid and I have installed it onto my Linux server. I would like know how can I set the configuration to... (1 Reply)
Discussion started by: kwliew999
1 Replies

3. Solaris

samba read write access to owner and no access to other users

Hi All, I want to configure samba share permission so that only directory creator/owner has a read and write permission and other users should not have any read/write access to that folder.Will that be possible and how can this be achieved within samba configuration. Regards, Sahil (1 Reply)
Discussion started by: sahil_shine
1 Replies

4. UNIX for Dummies Questions & Answers

Access to a file

Hey friends, If a file has permissions 000, then who can access the file? (1 Reply)
Discussion started by: paras.oriental
1 Replies

5. IP Networking

Does my provider limit my internet access or somesites access?

Hi Good Day, i would like to ask for further info about my problems experiencing this evening. Im a PPP0 connection in the internet using 3G located in asia pacific region.i had this problem this evening in my INTERNET connections that there are some sites i can't open example ( Gizmodo.com,... (2 Replies)
Discussion started by: jao_madn
2 Replies

6. UNIX for Dummies Questions & Answers

kernel giving access for multiple users to access files

hi all, i want to know y kernel is giving access for multiple users to access a file when one user may be the owner is executing that file. Because other user can manipulate that file when the other user is executing that file, it will give the unexpected result to owner . plz help me... (1 Reply)
Discussion started by: jimmyuk
1 Replies

7. Programming

how to write Microsoft Access MDB file to a text file, using C ?

I'm in the UNIX environment. I'd like to read a Microsoft Access MDB file, and write the contents of that file into an ASCII text file. I want to write a C program to do this. Does anyone know if there's already source code out there that does this? Please advise. Thanks. (3 Replies)
Discussion started by: serendipity1276
3 Replies

8. Cybersecurity

file permission/acl: 2 users with write access on 1 file...

Hello, i need some help/advice on how to solve a particular problem. these are the users: |name | group | ---------- --------------- |boss | department1 | |assistant | department1 | |employee | department1 | |spy | department2 | this is the... (0 Replies)
Discussion started by: elzalem
0 Replies

9. Programming

[C++] File I/O (Reading from a Random-Access File)

INFO: The program should enter a circle radius and Id for that circle to a file, then it should search for that id and print the radius for that circle. PROBLEM: This program compiles but it's not searching properly. Circle.h #ifndef CIRCLE_H #define CIRCLE_H #include <iostream>... (0 Replies)
Discussion started by: VersEtreOuNe
0 Replies

10. UNIX for Dummies Questions & Answers

Need help to access/mount so to access folder/files on a Remote System using Linux OS

Hi I need to access files from a specific folder of a Linux system from an another Linux System Remotely. I know how to, Export a folder on One SCO System & can access the same by using Import via., NFS in the Sco Unix SVR4 System using the scoadmin utility. Also, I know to use mount -t ... (2 Replies)
Discussion started by: S.Vishwanath
2 Replies
Login or Register to Ask a Question