What is the faster way to grep from huge file?

11-19-2015

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

@RUDIC & cjcox:

Please find the input(FILEA & FILEB) and output format(LDFILE):
Note: The dummy values shown below ranges from length 1500 - 1600 characters approx in a line
FILEA:

Code:

value~0000~refno001~value1~value2~value3........~value65

40000 lines like above with reference no unique
FILEB:

Code:

value~0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

40000 lines like abovewith reference no unique
OUTPUT format in file INFILE_TMP & LDFILE should be like below.

INFILE_TMP:

Code:

value~0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

LDFILE:

Code:

0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

Using reference no from FILEA I will grep the ref no in FILEB in returned line update some more values and then write them into the INFILE_TMP & LDFILE.

@RUDIC I have few questions in your suggestion

do you think we can read the file FILEB and load all of the single line shown above can be accommodated in single array element?
if we accommodate then in single array element it can be searched and retrieved using only refno?

@cjcox:

The lines in both FILEA & FILEB will be having tilde separated values of about 60 to 65 fields approx.
They can be empty fields, date filled values, numbers, character fields.
But only first three fields will be field with value(mandatory) in both files.
The 3rd field(refno) in both files are unique. so i am using ref no as key for grep.

Thanks.

Last edited by Don Cragun; 11-19-2015 at 02:17 AM.. Reason: Add CODE and ICODE tags.

mad man

View Public Profile for mad man

Find all posts by mad man

11-19-2015

Registered User

614, 110

Join Date: May 2005

Last Activity: 27 June 2016, 2:12 PM EDT

Posts: 614

Thanks Given: 4

Thanked 110 Times in 107 Posts

Not a solution, but something to think about. Turn the problem on its head. Avoid loops. Use FILEA to create a one time pass file (filter) against FILEB. For example:

Code:

sed 's;^[^~][^~]*~\([^~][^~]*\)~.*;s/^\\([^~][^~]*\\)~\1~\\([^~][^~]*\\)~\\([^~][^~]*\\)/\\1~\1~\\2~good/;' FILEA >filter.sed
sed -f filter.sed FILEB

Just an idea that might inspire you...

cjcox

View Public Profile for cjcox

Find all posts by cjcox

11-20-2015

Registered User

54, 1

Join Date: Nov 2015

Last Activity: 30 January 2019, 7:26 AM EST

Posts: 54

Thanks Given: 27

Thanked 1 Time in 1 Post

Hi cjcox Could you please explain what this sed command will do?
I could not understand from the series of special characters

mad man

View Public Profile for mad man

Find all posts by mad man

11-20-2015

Registered User

614, 110

Join Date: May 2005

Last Activity: 27 June 2016, 2:12 PM EDT

Posts: 614

Thanks Given: 4

Thanked 110 Times in 107 Posts

Sure... the sed outputs a sed script...

So find all "refs" in FILEA and turn them into sed search and replace statements and store in the filter.sed file and then run that file against FILEB.

There are no "special characters" really in that sed... you should be able to directly cut and paste it.

cjcox

View Public Profile for cjcox

Find all posts by cjcox

11-20-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Quote:

Originally Posted by mad man

.
.
.
FILEB:

Code:

value~0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

.
.
.
INFILE_TMP:

Code:

value~0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

LDFILE:

Code:

0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

.
.
.

From this very thin data basis, I can't suggest any solution or even algorithm. It seems you could use FILEB immediatedly as INFILE_TMP, and LDFILE would be derived from FILEB by removing field 1, and that's it.
If this is not what you require, post way mire details.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

UNIX for Dummies Questions & Answers

What is the faster way to grep from huge file?

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need help for faster file read and grep in big files

Discussion started by: reldb

2. Shell Programming and Scripting

Grep -v -f and sort|diff which way is faster

Discussion started by: ken6503

3. HP-UX

Faster command for file copy than cp ?

Discussion started by: shipra_31

4. HP-UX

Performance issue with 'grep' command for huge file size

Discussion started by: arb_1984

5. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Discussion started by: manishkomar007

6. Shell Programming and Scripting

Script to parse a file faster

Discussion started by: sags007_99

7. UNIX for Dummies Questions & Answers

Faster way to multiply a file Nth times?

Discussion started by: chstr_14

8. Shell Programming and Scripting

Grep matched records from huge file

Discussion started by: mjkreddy

9. UNIX for Dummies Questions & Answers

How to grep faster ?

Discussion started by: preethgideon

10. Shell Programming and Scripting

Which is faster? Reading from file or 'ps'

Discussion started by: orno