What is the faster way to grep from huge file?

Login or Register to Ask a Question and Join Our Community

What is the faster way to grep from huge file?

Tags

Top Forums UNIX for Dummies Questions & Answers What is the faster way to grep from huge file?

11-18-2015

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

What is the faster way to grep from huge file?

Hi All,

I am new to this forum and this is my first post.
My requirement is like to optimize the time taken to grep the file with 40000 lines.

There are two files FILEA(40000 lines) FILEB(40000 lines).
The requirement is like this, both the file will be in the format below

Code:

1~refno001~blah~blah

I will take the refno001 as a search key from FILEA and grep the FILEB. The FILEB will be having the updated value of refno001 as follows.

Code:

1~refno001~blah~good

.

So i will use this record from FILEB in my process. The problem here is i am using below code to grep from FILEB.

Code:

hold_rec=grep "$ref_no" /root/dev/FILEB

But this is taking 20 minutes to complete (i.e 40000 x 40000) search for each and every reference no from FILEA to FILEB. This has to be minized as short as possible.

So kindly guide me if there is any way to reduce the process time to less than 5 mins with any other commands.

Note: my UNIX is AIX 6.0

Last edited by Don Cragun; 11-19-2015 at 02:13 AM.. Reason: Add CODE and ICODE tags.

mad man

View Public Profile for mad man

Find all posts by mad man

11-18-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

This description is a somewhat sparse. I guess your are using a loop and a script? I guess you are reading the entire FILEB for each line of FILEA? How do you derive the ref_no variable? What is the desired result? What will be done with it?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-18-2015

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

Hi,

Yes. The FILEA is in perl while loop to read the file line by line and i am using an array and splitting the tilde separated line then i am retrieving the refno001 from FILEA and then i am using grep to fetch entire line from FILEB. I will update the fields in the line fetched from FILEB and form a new loadfile FILEC.
What i amexpecting is the replacement to the grep command line which i posted earlier.

Thanks.

mad man

View Public Profile for mad man

Find all posts by mad man

11-18-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

That definitely is NOT the right approach, as it would imply a suboptimization of a small aspect while the problem lies somewhere totally different. Post your entire script for analysis.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-18-2015

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

Ok Sorry for that!! i have attached the entire perl code which has the shell part which is taking time. Kindly let me know if you need any other details.

Thanks.

Code:

while (<FILEA>) 
{

 if ($run_type eq 'ONE')  {
    BLAH BLAH HERE
    }
elsif ($run_type eq 'TWO') {
     
     $a_rec     = $_;
     undef @myarray;
     @myarray = split(/$tilde/, $a_rec);
     $ref_id = trim($myarray[2]);
     $hold_rec = `grep $ref_id $trn_file`;    ##Here is where the time is taken much for searching in the FILEB=trn_file
     $return_code = $?;

     if ($return_code eq 0) {
        undef @holdarray;                                     ##initialize just before loading the array
        @holdarray     = split(/$tilde/, $hold_rec);
        $holdarray[57] = $upload_array[1];                  
        $holdarray[58] = 1;                                 #Just updating some fields here
        $current_tmstmp = trim(`date +"%Y%m%d_%H%M%S"`);    
        $holdarray[3]  = $current_tmstmp;                    

        print INFILE_TMP join("~",@holdarray)."\n";
        shift(@holdarray);
        print LDFILE join("~",@holdarray)."\n";
      }
     else  {
           $myarray[57] = $upload_array[1]; 
           $myarray[58] = 1;                       
           $current_tmstmp = trim(`date +"%Y%m%d_%H%M%S"`);
           $myarray[3]  = $current_tmstmp;        

           print INFILE_TMP join("~",@myarray)."\n";
           shift(@myarray);                                   
           print LDFILE join("~",@myarray)."\n";
        }
    }
else {
    BLAH BLAH HERE
}
}

---------- Post updated at 07:44 PM ---------- Previous update was at 06:59 PM ----------

Hi,

There was another thread in the forum suggested to use grep -F, i even tried that too it takes comparably same time with out -F option. Will AWK or SED can achieve faster than grep?

Thanks.

mad man

View Public Profile for mad man

Find all posts by mad man

11-18-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

My perl is almost non-existent, but from what I infer from above, for every line in FILEA you create a process, run the grep command, and sift through the entire FILEB.

How about reading FILEB entirely into one array at the start of the script, and then do the matching operations entirely in memory?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-18-2015

Registered User

614, 110

Join Date: May 2005

Last Activity: 27 June 2016, 2:12 PM EDT

Posts: 614

Thanks Given: 4

Thanked 110 Times in 107 Posts

Can we have more of the data? Does FiLEA just have keys? I there always something between tilde's? Are the lines in FILEB just more key like things? Or is it more free form? I think I can help, but there's just not enough information about the input and output.

cjcox

View Public Profile for cjcox

Find all posts by cjcox

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need help for faster file read and grep in big files

I have a very big input file <inputFile1.txt> which has list of mobile no inputFile1.txt 3434343 3434323 0970978 85233 ... around 1 million records i have another file as inputFile2.txt which has some log detail big file inputFile2.txt afjhjdhfkjdhfkd df h8983 3434343 | 3483 | myout1 |...

2. Shell Programming and Scripting

Grep -v -f and sort|diff which way is faster

Hi Gurus, I have two big files. I need to compare the different. currently, I am using sort file1 > file1_temp; sort file2 > file2_tmp diff file1_tmp file2_tmp I can use command grep -v -f file1 file2 just wondering which way is fast to compare two big files. Thanks...

3. HP-UX

Faster command for file copy than cp ?

we have 30 GB files on our filesystem which we need to copy daily to 25 location on the same machine (but different filesystem). cp is taking 20 min to do the copy and we have 5 different thread doing the copy. so in all its taking around 2 hr and we need to reduce it. Is there any...

4. HP-UX

Performance issue with 'grep' command for huge file size

I have 2 files; one file (say, details.txt) contains the details of employees and another file (say, emp.txt) has some selected employee names. I am extracting employee details from details.txt by using emp.txt and the corresponding code is: while read line do emp_name=`echo $line` grep -e...

5. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised...

6. Shell Programming and Scripting

Script to parse a file faster

My example file is as given below: conn=1 uid=oracle conn=2 uid=db2 conn=3 uid=oracle conn=4 uid=hash conn=5 uid=skher conn=6 uid=oracle conn=7 uid=mpalkar conn=8 uid=anarke conn=1 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.10.5.6 to 10.18.6.5 conn=2 op=-1 msgId=-1 -...

7. UNIX for Dummies Questions & Answers

Faster way to multiply a file Nth times?

Basically, my problem is to multiply my file to $c times. Is there a faster way to do this? c=100 while ]; do cat file1.txt ((c=$c-1)) done > file2.txt I appreciate your help!

8. Shell Programming and Scripting

Grep matched records from huge file

111111111100000000001111111111 123232323200000010001114545454 232435424200000000001232131212 342354234301000000002323423443 232435424200000000001232131212 2390898994200000000001238908092 This is the record format. From 11th position to 20th position in a record there are 0's occuring,and...

9. UNIX for Dummies Questions & Answers

How to grep faster ?

Hi I have to grep for 2000 strings in a file one after the other.Say the file name is Snxx.out which has these strings. I have to search for all the strings in the file Snxx.out one after the other. What is the fastest way to do it ?? Note:The current grep process is taking lot of time per...

10. Shell Programming and Scripting

Which is faster? Reading from file or 'ps'

Hi There... I have an application which starts up many different processes under different names and I'm creating a script to tell me which processes are running (approx 30 different processes). To do this, I parse the results of a ps -u $USER. My question is, will my script be faster if I run...

Login or Register to Ask a Question