File - reading - Performance improvement


 
Thread Tools Search this Thread
Top Forums Programming File - reading - Performance improvement
# 1  
Old 05-22-2008
File - reading - Performance improvement

Hi All
I am reading a huge file of size 2GB atleast. I am reading each line and cutting certain columns and writing it to another file.

Here is the logic.

Code:
int main()
{
     
	  string u_line;
	  string Char_List;
	  string u_file;
	  int line_pos;
	  string temp_form_u_file;
	  ofstream temp_u_file;
	  u_file=getenv("u_file");
	  temp_form_u_file=getenv("DATA_DIR");
	  ifstream U_File;
	  temp_u_file.open(temp_form_u_file.c_str(),ios::app);
      
	  
	  if (temp_u_file.fail()) {
      cout << "Unable to open file "<<temp_form_u_file<<" for writing" << endl;
      exit(1);
      }
       
	    
      U_File.open(u_file.c_str());
      if (U_File.fail())
      {
         cout<<"File "<<u_file<<" unable to open for reading\n";
         cout<<"dart_report job failed\n";
	     exit(3);
      } 
    
      while (! U_File.eof() )
      {

		 line_pos=72;
		 u_line.erase();
		 getline (U_File,u_line);

		 if ( ! u_line.empty())  {
         while (line_pos< u_line.length())
	     {
           
	       if (u_line.substr(line_pos,2)!= "  ")
	       {
  
				Char_List=u_line.substr(line_pos,41);
				Char_List.append(u_line.substr(16,4));
				Char_List.append("\n");
				temp_u_file<< Char_List;
          
           } 
                line_pos=line_pos+41;

	    }

      }
    }  
}

When i run this program it takes 2.5 to 3 hours to read the 2 GB file. I am trying to reduce the time taken to reading. Is there any way i can reduce the processing time of the program.

Kindly let me know. If i can use Shell Script it is also okay. But i feel 'C' will be faster than Shell Scripting.

Please give me your suggestions.

Regards
Dhana

Last edited by Yogesh Sawant; 05-22-2008 at 02:35 AM.. Reason: added code tags
# 2  
Old 05-22-2008
I believe shell script should be faster. With C/C++, there is a lot of copying of data to/from kernel, which makes C/C++ programs slow. To make C/C++ programs faster, you may use multithreading also.

- Dheeraj
# 3  
Old 05-22-2008
Hi,
I would suggest to use fread that is read data in bulk say thousands at a time and then manipulate it.You will surely get the performance improvement.
# 4  
Old 05-22-2008
Definitely C/C++ is faster than Shell Script.
Can you explain how fread is faster because i am going to read line by line only.


Regards
Kuttalaraj
# 5  
Old 05-23-2008
I think, read and write are the most low level system calls. All the other function like fread and fwrite again uses some low level function to do their work.
I think, using read for reading a chunk of data can improve the performance since their is not much overhead involved.

Regards,
Aamir
# 6  
Old 05-23-2008
HI
read(fd, buffer, n_to_read)
I am trying to use the above call, but i will not be able to read the entire line as i will not now the length of the line before hand.

This part is little tricky to handle.
If you have any idea please let me know.

Regards
Dhana
# 7  
Old 05-23-2008
Hello!
What you can try out is: have a huge circular buffer for example say around 6144 (6KB) , you can experiment with the size!!
What i mean by circular is have two pointers, start_ptr and processed_ptr.

Code:
offset = 0;
read(fd, &buffer[offset], 3KB);
if(offset == 0)
{
    // next time read in the next chunk of buffer
    offset = 3KB;
    start_ptr = 0;
}
else
{    
    offset = 0;
    start_ptr = 3KB; 
}

bytes_read = start_ptr - processed_ptr;

//start processing it
while(bytes_read >= minimum_size_of_record)
{
     ret_val = check_for_complete_record(processed_ptr);
    // incomplete record
     if(ret_val == -1)
     {
             // don't modify processed_ptr since the record is not complete
             break; //without modifying the pointers 
     }
     else
     {
          // in this case check_for_complete_record will return the size of record
          bytes_read = bytes_read - ret_val;
          processed_ptr = processed_ptr + ret_val;
     }      
}

1) Have start_ptr and processed_ptr as global
2) You must take care of rollover of processed_ptr for every read

Code:
     if(processed_ptr >= MAX_BUFFER_SIZE) // in this case 6KB
               processed_ptr = 0;

Regards,
Aamir
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Performance improvement in grep

Below script is used to search numeric data from around 400 files in a folder. I have 300 such folders. Need help in performance improvement in the script. Below Script searches 20 such folders ( 300 files in each folder) simultaneously. This increases cpu utilization upto 90% What changes... (3 Replies)
Discussion started by: vegasluxor
3 Replies

2. UNIX for Dummies Questions & Answers

Improvement in shell script

Hi This is my Following code: #!/bin/sh echo "TOTAL_NO_OF_MAILS" read TOTAL_NO_OF_MAILS echo "TOTAL_NO_OF_TICKETS " read TOTAL_NO_OF_TICKETS echo "TICKETS_IN_QUEUE" read TICKETS_IN_QUEUE rm -rf `pwd`/Focus echo "Hi Team\nSTATS IN CLRS MAIL BOX\n\n==============================" >> Focus... (11 Replies)
Discussion started by: wasim999
11 Replies

3. Shell Programming and Scripting

I need the improvement for my script

Hi All, Here is my script #! /bin/sh var1=some email id var2=some email id grep -i "FAILED FILE FORMAT VALIDATION" /opt >tmp2 diff tmp1 tmp2 | grep ">" >tmp3 if then cat tmp3 | mailx -s " Error Monitoring" $var2 else echo "Pattern NOt Found" | mailx -s " Error Monitoring" $var1... (1 Reply)
Discussion started by: Gopalak
1 Replies

4. UNIX for Advanced & Expert Users

linux os improvement

can anyone help to share the knowledge on linux os improvement? 1) os account - use window AD authentication, such as ldap, but how to set /etc/passwd, where to put user home? 2) user account activity - how to log os user activity share the idea and what tools can do that...thx (5 Replies)
Discussion started by: goodbid
5 Replies

5. Infrastructure Monitoring

Possible performance improvement (Bash and flat file)

Hello, I am pretty new to shell scripts and I recently wrote one that seems to do what it should but I am exploring the possibility of improving its performance and would appreciate some help. Here is what it does - Its meant to monitor a bunch of systems (reads in IPs one at a time from a flat... (9 Replies)
Discussion started by: prafulnama
9 Replies

6. Shell Programming and Scripting

Any improvement possible in this script

Hi! Thank you for the help yesterday This is the finished product There is one more thing I would like to do to it but I’m not to certain On how to proceed I would like to log all output to a log in order to Be able to roll back This script is meant to be used in repairing a... (4 Replies)
Discussion started by: Ex-Capsa
4 Replies

7. Shell Programming and Scripting

Script ready but might need some improvement.

Hi All, I have written a script which does some editing in the files, based on user input.This might not be the most elegant way of doing it and there would be many improvements needed. Please go through it and let me know how it could be improved. Suggestions are welcome!! Thanks!... (2 Replies)
Discussion started by: nua7
2 Replies
Login or Register to Ask a Question