What is the faster way to grep from huge file?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers What is the faster way to grep from huge file?
# 8  
Old 11-19-2015
@RUDIC & cjcox:

Please find the input(FILEA & FILEB) and output format(LDFILE):
Note: The dummy values shown below ranges from length 1500 - 1600 characters approx in a line
FILEA:
Code:
value~0000~refno001~value1~value2~value3........~value65

40000 lines like above with reference no unique
FILEB:
Code:
value~0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

40000 lines like abovewith reference no unique
OUTPUT format in file INFILE_TMP & LDFILE should be like below.

INFILE_TMP:
Code:
value~0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

LDFILE:
Code:
0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

Using reference no from FILEA I will grep the ref no in FILEB in returned line update some more values and then write them into the INFILE_TMP & LDFILE.

@RUDIC I have few questions in your suggestion

do you think we can read the file FILEB and load all of the single line shown above can be accommodated in single array element?
if we accommodate then in single array element it can be searched and retrieved using only refno?

@cjcox:

The lines in both FILEA & FILEB will be having tilde separated values of about 60 to 65 fields approx.
They can be empty fields, date filled values, numbers, character fields.
But only first three fields will be field with value(mandatory) in both files.
The 3rd field(refno) in both files are unique. so i am using ref no as key for grep.

Thanks.

Last edited by Don Cragun; 11-19-2015 at 02:17 AM.. Reason: Add CODE and ICODE tags.
# 9  
Old 11-19-2015
Not a solution, but something to think about. Turn the problem on its head. Avoid loops. Use FILEA to create a one time pass file (filter) against FILEB. For example:

Code:
sed 's;^[^~][^~]*~\([^~][^~]*\)~.*;s/^\\([^~][^~]*\\)~\1~\\([^~][^~]*\\)~\\([^~][^~]*\\)/\\1~\1~\\2~good/;' FILEA >filter.sed
sed -f filter.sed FILEB

Just an idea that might inspire you...
# 10  
Old 11-20-2015
Hi cjcox Could you please explain what this sed command will do?
I could not understand from the series of special characters Smilie
# 11  
Old 11-20-2015
Sure... the sed outputs a sed script...

So find all "refs" in FILEA and turn them into sed search and replace statements and store in the filter.sed file and then run that file against FILEB.

There are no "special characters" really in that sed... you should be able to directly cut and paste it.
# 12  
Old 11-20-2015
Quote:
Originally Posted by mad man
.
.
.
FILEB:
Code:
value~0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

.
.
.
INFILE_TMP:
Code:
value~0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

LDFILE:
Code:
0000~refno001~value1~updatedvalue2~updatedvalue3........~value65

.
.
.
From this very thin data basis, I can't suggest any solution or even algorithm. It seems you could use FILEB immediatedly as INFILE_TMP, and LDFILE would be derived from FILEB by removing field 1, and that's it.
If this is not what you require, post way mire details.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need help for faster file read and grep in big files

I have a very big input file <inputFile1.txt> which has list of mobile no inputFile1.txt 3434343 3434323 0970978 85233 ... around 1 million records i have another file as inputFile2.txt which has some log detail big file inputFile2.txt afjhjdhfkjdhfkd df h8983 3434343 | 3483 | myout1 |... (3 Replies)
Discussion started by: reldb
3 Replies

2. Shell Programming and Scripting

Grep -v -f and sort|diff which way is faster

Hi Gurus, I have two big files. I need to compare the different. currently, I am using sort file1 > file1_temp; sort file2 > file2_tmp diff file1_tmp file2_tmp I can use command grep -v -f file1 file2 just wondering which way is fast to compare two big files. Thanks... (4 Replies)
Discussion started by: ken6503
4 Replies

3. HP-UX

Faster command for file copy than cp ?

we have 30 GB files on our filesystem which we need to copy daily to 25 location on the same machine (but different filesystem). cp is taking 20 min to do the copy and we have 5 different thread doing the copy. so in all its taking around 2 hr and we need to reduce it. Is there any... (9 Replies)
Discussion started by: shipra_31
9 Replies

4. HP-UX

Performance issue with 'grep' command for huge file size

I have 2 files; one file (say, details.txt) contains the details of employees and another file (say, emp.txt) has some selected employee names. I am extracting employee details from details.txt by using emp.txt and the corresponding code is: while read line do emp_name=`echo $line` grep -e... (7 Replies)
Discussion started by: arb_1984
7 Replies

5. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

6. Shell Programming and Scripting

Script to parse a file faster

My example file is as given below: conn=1 uid=oracle conn=2 uid=db2 conn=3 uid=oracle conn=4 uid=hash conn=5 uid=skher conn=6 uid=oracle conn=7 uid=mpalkar conn=8 uid=anarke conn=1 op=-1 msgId=-1 - fd=104 slot=104 LDAPS connection from 10.10.5.6 to 10.18.6.5 conn=2 op=-1 msgId=-1 -... (7 Replies)
Discussion started by: sags007_99
7 Replies

7. UNIX for Dummies Questions & Answers

Faster way to multiply a file Nth times?

Basically, my problem is to multiply my file to $c times. Is there a faster way to do this? c=100 while ]; do cat file1.txt ((c=$c-1)) done > file2.txt I appreciate your help! (6 Replies)
Discussion started by: chstr_14
6 Replies

8. Shell Programming and Scripting

Grep matched records from huge file

111111111100000000001111111111 123232323200000010001114545454 232435424200000000001232131212 342354234301000000002323423443 232435424200000000001232131212 2390898994200000000001238908092 This is the record format. From 11th position to 20th position in a record there are 0's occuring,and... (6 Replies)
Discussion started by: mjkreddy
6 Replies

9. UNIX for Dummies Questions & Answers

How to grep faster ?

Hi I have to grep for 2000 strings in a file one after the other.Say the file name is Snxx.out which has these strings. I have to search for all the strings in the file Snxx.out one after the other. What is the fastest way to do it ?? Note:The current grep process is taking lot of time per... (7 Replies)
Discussion started by: preethgideon
7 Replies

10. Shell Programming and Scripting

Which is faster? Reading from file or 'ps'

Hi There... I have an application which starts up many different processes under different names and I'm creating a script to tell me which processes are running (approx 30 different processes). To do this, I parse the results of a ps -u $USER. My question is, will my script be faster if I run... (2 Replies)
Discussion started by: orno
2 Replies
Login or Register to Ask a Question