Mapping all file at once or by page


 
Thread Tools Search this Thread
Top Forums Programming Mapping all file at once or by page
# 1  
Old 05-12-2009
Mapping all file at once or by page

This might be a silly question but I was wondering if, in case of huge files (2-3GB), it is more efficient to map the whole file at once, or to map it page by page.

The file has to be processed sequentially from the start to the end.

Thanks.
# 2  
Old 05-12-2009
could you explain more about what do you mean by map the file?

if you want to process all records / lines in the file, pagination won't help (i've never done such a thing). But depends on what you mean by page by page. If you don't have enough memory, and want to split the file in smaller chunks of a few megabytes, process each one, and then combine results, i haven't tried that myself.

instead, my approach would be to reduce the data set to minimal, and then process it (whether this can ve done or not depends on actual data that you have)
# 3  
Old 05-12-2009
Basically I have this file, whose format is standardized and I cannot touch, and I have to (pre)process it as fast as possible. Now, the file is pretty huge (magnitude of GB), and I can either use simple read()/fread() or mapping the file (mmap()) . By processing, I mean that I have to extract statical data, plus constructing some search indexes.

By mapping the file, I meant memory mapping (i.e. mmap) the whole file at once, or the file page by page, or n pages at the time.

Thanks.
# 4  
Old 05-12-2009
You want to map the minimum number of pages into the address space of the user process so as not to cause any swapping and that totally depends on how the program has been coded. I would start with a few pages at a time instead of the whole file...though you may find that mapping the entire file into memory may work very well and from that you can see that it is a matter of trail and error.
# 5  
Old 05-13-2009
Quote:
Originally Posted by shamrock
You want to map the minimum number of pages into the address space of the user process so as not to cause any swapping and that totally depends on how the program has been coded. I would start with a few pages at a time instead of the whole file...though you may find that mapping the entire file into memory may work very well and from that you can see that it is a matter of trail and error.
Could you be more clear about "how the program has been coded" ?
What do you mean exactly?
# 6  
Old 05-13-2009
Quote:
Originally Posted by emitrax
Could you be more clear about "how the program has been coded" ?
What do you mean exactly?
As I said in my last post it is a matter of trial and error. There are a few things to note before trying to mmap the file into the address space of the process. Check how much free (physical not virtual) memory is available. If free physical memory is less than the size of entire file then loading the entire file will create swapping which is undesirable. In this case it is better to load a few pages at a time and see the overall health of the system. So it all depends on how you have coded the mmap call i.e. what parameters you are passing to it and most important is the last one which should be a multiple of the pagesize on your system.

Last edited by shamrock; 05-13-2009 at 01:53 PM..
# 7  
Old 05-13-2009
If you are doing a straight sequential read through the file, and not bouncing back and forth among records, consider what Steven's "Advanced Programming in the UNIX Environment" says about buffering files and I/O throughput - basically that a buffer in the size of 16K-32K (set with setvbuf() ) provides the best throughtput on the systems that Rago( the current author) tested on a pretty large file.

Consider sequential I/O first, then mapping second. Both have strong points. The reason sequential I/O does well is that most intelligent disk controllers prefetch several large data blocks, so that there is greatly reduced I/O wait times.

Mapping is reall great if your program references, say record #92, then #40000, then back to #91 - an Applications like a sort or maybe a binary search. On systems with huge amounts of memory it is also the fastest possible way to read a file. But - if your mmap starts using virtual memory (ie swap space), then you lose the speed advantage. Swapping overhead is disk I/O by another name.

Last edited by jim mcnamara; 05-13-2009 at 06:00 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replacing 12 columns of one file by second file based on mapping in third file

i have a real data prod file with 80+ fields containing 1k -2k records. i have to extract say 12 columns out of this which are sensitive fields along with one primary key say SEQ_ID (like DOB,account no, name, SEQ_ID, govtid etc) in a lookup file. i have to replace these sensitive fields in... (11 Replies)
Discussion started by: megh12
11 Replies

2. UNIX for Dummies Questions & Answers

Formatting data in a raw file by using another mapping file

Hi All, i have a requirement where i need to format the input RAW file ( which is CSV) by using another mapping file(also CSV file). basically i am getting feed file with dynamic headers by using mapping file (in that target field is mapped with source filed) i have to convert the raw file into... (6 Replies)
Discussion started by: ravi4informatic
6 Replies

3. Shell Programming and Scripting

Search and replace with mapping from a mapper file in a target file

Hello, I have a special problem. I have a file in 8 bit and would like to convert the whole database to 16Bit unicode. The mapping file has the following structure: The mapper is provided as a zip file The target file to be converted contains data in English and 8 bit Urdu mapping, a... (4 Replies)
Discussion started by: gimley
4 Replies

4. UNIX for Dummies Questions & Answers

Mapping a data in a file and delete line in source file if data does not exist.

Hi Guys, Please help me with my problem here: I have a source file: 1212 23232 343434 ASAS1 4 3212 23232 343434 ASAS2 4 3234 23232 343434 QWQW1 4 1134 23232 343434 QWQW2 4 3212 23232 343434 QWQW3 4 and a mapping... (4 Replies)
Discussion started by: kokoro
4 Replies

5. Shell Programming and Scripting

Creating unique mapping from multiple mapping

Hello, I do not know if this is the right title to use. I have a large dictionary database which has the following structure: where a b c d e are in English and p q r s t are in a target language., the two separated by the delimiter =. What I am looking for is a perl script which will take... (5 Replies)
Discussion started by: gimley
5 Replies

6. Shell Programming and Scripting

Mapping with series from master file and calculate count

Hi All, My shell script is calculating the count of each shortcode series wise whose sample output is as follows: -------------------------- 56882 9124 1 9172 1 9173 4 8923 6 9175 1 9058 2 7398 2 -------------------------- 58585 series count 9124 1 8858 17 9061 21 9125 21 (10 Replies)
Discussion started by: poweroflinux
10 Replies

7. Shell Programming and Scripting

read a file and use the content for mapping

help me pls.. :( i want to read a mapping file. Below is the content of my mapping file. 6221,189,SMSC1,OMC1,WAP1 6223,188,SMSC2,OMC2,WAP2 so when my program running msisdn="622130302310" while not EOF if substring($msisdn,1,4) == "6221" -- > "6221" read from the file then echo... (0 Replies)
Discussion started by: voidmain
0 Replies

8. Shell Programming and Scripting

Join 3 files using key column in a mapping file

I'm new of UNIX shell scripting. I'm recently generating a excel report in UNIX(file with delimiter is fine). How should I make a script to do it? 1 file to join comes from output of one UNIX command, the second from another UNIX command, and third from a database query. The key columes of all... (7 Replies)
Discussion started by: bigsmile
7 Replies

9. Shell Programming and Scripting

[BASH] mapping of values from file line into variables

Hello, I've been struggling with this for some time but can't find a way to do it and I haven't found any other similar thread. I'd like to get the 'fields' in a line from a file into variables in just one command. The file contains data with the next structure:... (4 Replies)
Discussion started by: semaler
4 Replies

10. Linux

mapping of a printer model with a ppd file in CUPS

Hi all, I am currently working on building a GUI to be interfaced with CUPS 1.3.4 package; In my GUI I have a list of printer manufacturers mapped With various printer models ; and for a particular printer model selected I needed to know how to map that model with an Appropriate ppd file; as I... (0 Replies)
Discussion started by: sc3008
0 Replies
Login or Register to Ask a Question