awk - fetch multiple data from huge dump


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk - fetch multiple data from huge dump
# 1  
Old 01-17-2014
awk - fetch multiple data from huge dump

Hello Experts

I have a requirement wherein I need to fetch multiple data from huge dump

Code:
egrep -f Pattern.txt Dump.txt

My pattern file has got like 300 entries and Dump file is like 8GB data.
It taking eternity to complete on my machine.
Is their a faster way to search pattern like using awk of something !!!
Please help.

Regards
Navkanwal

Last edited by Scott; 01-17-2014 at 07:44 AM.. Reason: Code tags
# 2  
Old 01-17-2014
Without input its difficult to guess, you can try something like below

Wild guess :
Code:
$ awk 'FNR==NR{A[$0];next}($0 in A)' Pattern.txt Dump.txt

Code:
$ awk 'FNR==NR{A[$0];next}{for(i in A){if(i~$0){print;next}}}' Pattern.txt Dump.txt

Provide sample input and expected output.
# 3  
Old 01-17-2014
Hi Akshay

Pattern:
Code:
110010000000206
110010000000210
110010000000211
110010000000214

Dump:
Code:
567810008161509|aav|xac|aab|aav|xac|aab|567810008161509|110010000000206|aav|xac|aab|aav|xac|aab|
567810072227627|aav|xac|aab|aav|xac|aab|567810008161509|110010000000207|aav|xac|aab|aav|xac|aab|
567811368851555|aav|xac|aab|aav|xac|aab|567810008161509|110010000000208|aav|xac|aab|aav|xac|aab|
567811422513652|aav|xac|aab|aav|xac|aab|567810008161509|110010000000209|aav|xac|aab|aav|xac|aab|
567812130217683|aav|xac|aab|aav|xac|aab|567810008161509|xac|dud|110010000000210|xac|aab|aav|xac|aab|
567813220211182|aav|xac|aab|aav|xac|aab|567810008161509|110010000000211|aav|xac|aab|aav|xac|aab|
567813449322589|aav|xac|aab|aav|xac|aab|567810008161509|110010000000212|aav|xac|aab|aav|xac|aab|
567813741319623|aav|xac|aab|aav|xac|aab|567810008161509|110010000000213|aav|xac|aab|aav|xac|aab|
567816323171591|aav|xac|aab|aav|xac|aab|567810008161509|mum|does|110010000000214|aav|xac|aab|aav|xac|aab|
567816660521463|aav|xac|aab|aav|xac|aab|567810008161509|110010000000215|aav|xac|aab|aav|xac|aab|
567818208711973|aav|xac|aab|aav|xac|aab|567810008161509|110010000000216|aav|xac|aab|aav|xac|aab|

In the Dump files, the field is not constant so it could be in any field number.
Regards
Navkanwal

Last edited by Scott; 01-17-2014 at 08:10 AM.. Reason: Use code tags, please...
# 4  
Old 01-17-2014
Codetag please


Code:
$ cat pattern
110010000000206
110010000000210
110010000000211
110010000000214

Code:
$ cat dump
567810008161509|aav|xac|aab|aav|xac|aab|567810008161509|110010000000206|aav|xac|aab|aav|xac|aab|
567810072227627|aav|xac|aab|aav|xac|aab|567810008161509|110010000000207|aav|xac|aab|aav|xac|aab|
567811368851555|aav|xac|aab|aav|xac|aab|567810008161509|110010000000208|aav|xac|aab|aav|xac|aab|
567811422513652|aav|xac|aab|aav|xac|aab|567810008161509|110010000000209|aav|xac|aab|aav|xac|aab|
567812130217683|aav|xac|aab|aav|xac|aab|567810008161509|110010000000210|aav|xac|aab|aav|xac|aab|
567813220211182|aav|xac|aab|aav|xac|aab|567810008161509|110010000000211|aav|xac|aab|aav|xac|aab|
567813449322589|aav|xac|aab|aav|xac|aab|567810008161509|110010000000212|aav|xac|aab|aav|xac|aab|
567813741319623|aav|xac|aab|aav|xac|aab|567810008161509|110010000000213|aav|xac|aab|aav|xac|aab|
567816323171591|aav|xac|aab|aav|xac|aab|567810008161509|110010000000214|aav|xac|aab|aav|xac|aab|
567816660521463|aav|xac|aab|aav|xac|aab|567810008161509|110010000000215|aav|xac|aab|aav|xac|aab|
567818208711973|aav|xac|aab|aav|xac|aab|567810008161509|110010000000216|aav|xac|aab|aav|xac|aab|

Code:
$  awk -F"|" 'FNR==NR{A[$1];next}($9 in A)' pattern dump

Resulting
Code:
567810008161509|aav|xac|aab|aav|xac|aab|567810008161509|110010000000206|aav|xac|aab|aav|xac|aab|
567812130217683|aav|xac|aab|aav|xac|aab|567810008161509|110010000000210|aav|xac|aab|aav|xac|aab|
567813220211182|aav|xac|aab|aav|xac|aab|567810008161509|110010000000211|aav|xac|aab|aav|xac|aab|
567816323171591|aav|xac|aab|aav|xac|aab|567810008161509|110010000000214|aav|xac|aab|aav|xac|aab|

--edit---

Quote:
...
In the Dump files, the field is not constant so it could be in any field number.
Regards
Navkanwal
for any field
Code:
$ awk -F"|" 'FNR==NR{A[$0];next}{for(i=1;i<=NF;i++)for(j in A)if(j==$i)print;next}' pattern dump


Last edited by Akshay Hegde; 01-17-2014 at 08:15 AM..
# 5  
Old 01-17-2014
Quote:
Originally Posted by navkanwal
In the Dump files, the field is not constant so it could be in any field number.
Regards
Navkanwal
In that case awk in not gonna beat egrep.
# 6  
Old 01-17-2014
Some RE implementations are slow in certain locales.
Then it helps to switch to C locale:
Code:
export LC_ALL=C

In your case it can also help to switch from egrep to fgrep.
Further, your example exactly matches one field, then maybe this awk script is faster (an improvement of the previous post):
Code:
awk -F"|" 'FNR==NR {A[$1]; next} {for (i=1;i<=NF;i++) if ($i in A) {print; next}}' pattern dump

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ...... (2 Replies)
Discussion started by: kartikirans
2 Replies

2. Shell Programming and Scripting

Fetch data from file

Hi, I am new to scripting. I have a log file and need to fetch specific logs and copy to another file. A copy of the log is like this: =============================================================== = JOB : server123#jobs1.jobstream1 = USER : andyc = Tue 08/01/17... (3 Replies)
Discussion started by: Prngp
3 Replies

3. Shell Programming and Scripting

Need to fetch only selected data in CSV

Hi Team, I m getting my script commands output like given below GETA-TILL-INF; U-UU-YRYT-NOD-6002 2015-05-14 THU 19:44:10 C2221 RETRIEVE TILL INFORMATION : COMPLD ---------------------------------------------------------------------- CONNECT_CARD_ID ... (9 Replies)
Discussion started by: Ganesh Mankar
9 Replies

4. Shell Programming and Scripting

awk does not work well with huge data?

Dear all , I found that if we work with thousands line of data, awk does not work perfectly. It will cut hundreds line (others are deleted) and works only on the remain data. I used this command : awk '$1==1{$1="Si"}{print>FILENAME}' coba.xyz to change value of first column whose value is 1... (4 Replies)
Discussion started by: ariesto
4 Replies

5. Shell Programming and Scripting

Help Need to fetch the required data

Hi Guys, Am in need of your help one more time on my real data. I have a file which contains more than thousand lines of data Live data shown for 4 iterations. We have more than thousand lines of data:- -------------------------------------------------------------------------- ... (4 Replies)
Discussion started by: rocky2013
4 Replies

6. UNIX for Dummies Questions & Answers

how to fetch data in unix

Hi All, I have a file with the below data as shown. A|2|20120430 B|EMP|NAME|DEPT C|12|SARC|01 C|23||ASDD|02 D|END OF FILE I want to fetch only the records that contains C|, what is unix command to fetch this data. Thanks (5 Replies)
Discussion started by: halpavan2
5 Replies

7. Shell Programming and Scripting

Fetch data from a particular location

I want to fetch value from a particular location from a file but in each line in the file it appears at a different position so i tried using variable with cut command but it is not working properly. The code i have written is #!/bin/sh cat Sri1.log | while read d2 do grep -w... (9 Replies)
Discussion started by: Prachi Gupta
9 Replies

8. Shell Programming and Scripting

Awk to Count Multiple patterns in a huge file

Hi, I have a file that is 430K lines long. It has records like below |site1|MAP |site2|MAP |site1|MODAL |site2|MAP |site2|MODAL |site2|LINK |site1|LINK My task is to count the number of time MAP, MODAL, LINK occurs for a single site and write new records like below to a new file ... (5 Replies)
Discussion started by: reach.sree@gmai
5 Replies

9. Shell Programming and Scripting

fetch data between two timestamp using script

Hi Guys, I have the data in below format. 25 Dec 2011 03:00:01 : aaaaaaaaaaaaaaa 25 Dec 2011 04:23:23 : bbbbbbbbbbbbbbb 25 Dec 2011 16:12:45 : ccccccccccccccc 26 Dec 2011 04:45:34 : ddddddddddddddd 26 Dec 2011 17:01:22 : eeeeeeeeeeeeeee 27 Dec 2011 12:33:45 : ffffffffffffffffffffffff 28... (13 Replies)
Discussion started by: jaituteja
13 Replies

10. Shell Programming and Scripting

Fetch selected data from webpage

Hi All, Can anybody tell me the command used for extracting some selected lines from a web-page. I guess we'll have to do this using wget or Curl to achieve this.... If anbody has any idea abt it, kindly post your reply ASAP. Thanks. (1 Reply)
Discussion started by: sunnydynamic15
1 Replies
Login or Register to Ask a Question