Perl or awk/egrep from big files??


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl or awk/egrep from big files??
# 1  
Old 11-17-2008
Perl or awk/egrep from big files??

Hi experts.

In one thread i have asked you how to grep the string from the below sample file-

Unfortunately the script did not gave proper output (it missed many strings). It happened may be i did gave you the proper contents of the file

That was the script-
"$ perl -00nle'print join "\n", /<fullOperation>(.*?):.*<fullResult>(.*?);/s' filename.txt"

Now for you convinience i paste contents here from the begining of the file-

Output of the below file would be-

CREATE
RESP:-3010

DELETE
RESP:0

CREATE
RESP:911364896

GET
RESP:0

SET
RESP:911265678


<?xml version='1.0' encoding='ISO-8859-1' standalone='no'?>
<LogItems>
<log logid="83efeae5190811100759420954">
<category>Upstream.CAI</category>
<operation>Login</operation>
<target>CAI</target>
<instance></instance>
<user></user>
<context></context>
<fullOperation>LOGIN:server1:eri4ema</fullOperation>
<starttime>20081110075942.366900</starttime>
<stoptime>20081110075942.424451</stoptime>
<fullResult>RESP:3001;</fullResult>
<status>FAILED</status>
</log>
<log logid="83efeae5190811100759480955">
<category>Upstream.CAI</category>
<operation>Login</operation>
<target>CAI</target>
<instance></instance>
<user></user>
<context></context>
<fullOperation>LOGIN:server1:eri4ema;</fullOperation>
<starttime>20081110075948.375669</starttime>
<stoptime>20081110075948.375923</stoptime>
<fullResult>RESP:3007;</fullResult>
<status>FAILED</status>
</log>
<log logid="83efeae5190811100759580956">
<category>Upstream.CAI</category>
<operation>Login</operation>
<target>CAI</target>
<instance></instance>
<user>server1</user>
<context>sog</context>
<fullOperation>LOGIN:server1:*******;</fullOperation>
<starttime>20081110075958.354986</starttime>
<stoptime>20081110075958.355238</stoptime>
<fullResult>RESP:0;</fullResult>
<status>SUCCESSFUL</status>
</log>
</LogItems>

<?xml version='1.0' encoding='ISO-8859-1' standalone='no'?>
<LogItems>
<log logid="83efeae5190811100802020957">
<category>Upstream.CAI</category>
<operation>Get</operation>
<target>ESUB</target>
<instance>CODE=432350114484630</instance>
<user>server1</user>
<context>sog</context>
<fullOperation>GET:ESUB:CODE,432350114484630;</fullOperation>
<starttime>20081110080202.185236</starttime>
<stoptime>20081110080202.834500</stoptime>
<fullResult>RESP:11000003;UNKNOWN SUBSCRIBER;</fullResult>
<status>FAILED</status>
</log>
</LogItems>

<?xml version='1.0' encoding='ISO-8859-1' standalone='no'?>
<LogItems>
<log logid="83efeae5190811100802120958">
<category>Upstream.CAI</category>
<operation>Get</operation>
<target>DSUB</target>
<instance></instance>
<user>server1</user>
<context>sog</context>
<fullOperation>GETSmilieSUB:MDN,989352375449;</fullOperation>
<starttime>20081110080212.352053</starttime>
<stoptime>20081110080213.376720</stoptime>
<fullResult>RESP:0:MDN,989352375449:CODE,432350114484630:COUNTRY,FI:LANG,fi:PRE,0:SUBNAME,Eserve:MMS ,1;</fullResult>
<status>SUCCESSFUL</status>
</log>
</LogItems>

<?xml version='1.0' encoding='ISO-8859-1' standalone='no'?>
<LogItems>
<log logid="83efeae5190811100802350959">
<category>Upstream.CAI</category>
<operation>Get</operation>
<target>ACCOUNTINFORMATION</target>
<instance></instance>
<user>server1</user>
<context>sog</context>
<fullOperation>GET:ACCOUNTINFORMATION:SubscriberNumber,989352375449;</fullOperation>
<starttime>20081110080235.264165</starttime>
<stoptime>20081110080235.555880</stoptime>
<fullResult>RESP:-3010;;</fullResult>
<status>FAILED</status>
</log>
<log logid="83efeae5190811100802450960">
<category>Upstream.CAI</category>
<operation>Delete</operation>
<target>EDSUB</target>
<instance></instance>
<user>server1</user>
<context>sog</context>
<fullOperation>DELETE:EDSUB:CODE,432350114484630:MDN,989352375449:PRE,0SmilieEST,ALL;</fullOperation>
<starttime>20081110080245.012208</starttime>
<stoptime>20081110080245.857994</stoptime>
<fullResult>RESP:0;</fullResult>
<status>SUCCESSFUL</status>
</log>
<log logid="83efeae5190811100802510961">
<category>Upstream.CAI</category>
<operation>Create</operation>
<target>EDSUB</target>
<instance></instance>
<user>server1</user>
<context>sog</context>
<fullOperation>CREATE:EDSUB:CODE,432350114484630:KI,1C9B39AAF3931D60C064F6E8FBB5B1E6:MDN,98935237544 9:PRE,0SmilieEST,ALL;</fullOperation>
<starttime>20081110080251.089898</starttime>
<stoptime>20081110080251.489396</stoptime>
<fullResult>RESP:911364896;</fullResult>
<status>FAILED</status>
</log>
<log logid="83efeae5190811100802540962">
<category>Upstream.CAI</category>
<operation>Get</operation>
<target>ESUB</target>
<instance>CODE=432350114484630</instance>
<user>server1</user>
<context>sog</context>
<fullOperation>GET:ESUB:CODE,432350114484630;</fullOperation>
<starttime>20081110080254.000313</starttime>
<stoptime>20081110080254.697545</stoptime>
<fullResult>RESP:0:MDN,989352375449:CODE,432350114484630:T11,1:T21,1:T22,1:B16,1:T62,1:BAIC,0:BAOC,0 :BOIC,0:BIRO,0:BORO,0:BOIH,0:BOS4,0:CLIP,1:CLIR,0:CFB,1:CFNR,1:CFNA,1:CFU,1:HOLD,1:CW,1:MPTY,1:BAICS ,0,0:BAOCS,0,0:BOICS,0,0:PRE,0;</fullResult>
<status>SUCCESSFUL</status>
</log>
<log logid="83efeae5190811100802570963">
<category>Upstream.CAI</category>
<operation>Set</operation>
<target>DSUB</target>
<instance></instance>
<user>server1</user>
<context>sog</context>
<fullOperation>SETSmilieSUB:MDN,989352375449;</fullOperation>
<starttime>20081110080257.888204</starttime>
<stoptime>20081110080257.999121</stoptime>
<fullResult>RESP:911265678;</fullResult>
<status>FAILED</status>
</log>
</LogItems>
# 2  
Old 11-17-2008
below commands I run. But not getting the proper output. However, it takes 3 minutes for 35MB file. But i have 900MB file Smilie

egrep '<fullOperation>DELETE|<fullOperation>SET|<fullOperation>CREATE|<fullOperation>GET|<fullResult>RESP '

Ouput was-

<fullOperation>GET:ESUB:MDN,989371072136;</fullOperation>
<fullResult>RESP:0:MDN,989371072136:CODE,432350022011344:LASTNAME,989371072136:FIRSTNAME,2008-11-08_16_10:COUNTRY,IR:LANG,fa:PRE,1:SUBNAME,Eserve:MMS,0;</fullResult>
<fullResult>RESP:0;</fullResult>
<fullResult>RESP:0;</fullResult> --> Resp: comes twice


Output should be-
GET
Resp:0
# 3  
Old 11-17-2008
Yes,
the sample data in the previous threat was different (I was assuming tags separated by blank(empty) lines).
Try this:

Code:
perl -nle'BEGIN {$/="</log>";$,="\n";$\="\n\n"}
  print /<fullOperation>(.*?):.*<fullResult>(.*?:.*?)[:;]/s
  ' infile

# 4  
Old 11-17-2008
Oaoo Greate its working. I want to put the Output in the files.

i ran it like below. But output.txt contains few string which is not matched with original output in my screen

perl -nle'BEGIN {$/="</log>";$,="\n";$\="\n\n"} print /<fullOperation>(.*?):.*<fullResult>(.*?:.*?)[:;]/s' 2008-11-11.0.log > output.txt
# 5  
Old 11-17-2008
This is strange, could you post an example of those strings?
# 6  
Old 11-17-2008
Opss i really sorry buddy.. It worked. In fact i put the wrong filename.

Anyway, i hope i can put 'N' number of filenames with the perl script.

perl -nle'BEGIN {$/="</log>";$,="\n";$\="\n\n"} print /<fullOperation>(.*?):.*<fullResult>(.*?:.*?)[:;]/s' logfile1 logflie2...logfileN
# 7  
Old 11-17-2008
You can.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a big file into multiple files using awk

this thread is a continuation from previous thread https://www.unix.com/shell-programming-and-scripting/223901-split-big-file-into-multiple-files-based-first-four-characters.html ..I am using awk to split file and I have a syntax error while executing the below code I am using AIX 7.2... (4 Replies)
Discussion started by: etldev
4 Replies

2. Shell Programming and Scripting

sed and egrep in perl

Hi i have a data file whcih contains the data as follows : FH332OY86|AAABBB CCCC DDDA FHLMC 30 8.000|FHLMC|3|30|8.00000000|1986|26.29164289|3.29544844|0.00000000|10.05940539|107.50704264|Mar 8 2013 12:00AM|20130311|D|DA|DAA|DAAC|201302 FH332OY87|AAABBB CCCC DDDA FHLMC 30... (9 Replies)
Discussion started by: ptappeta
9 Replies

3. Shell Programming and Scripting

Perl match multiple numbers from a variable similar to egrep

I want to match the number exactly from the variable which has multiple numbers seperated by pipe symbol similar to search in egrep.below is the code which i tried #!/usr/bin/perl my $searchnum = $ARGV; my $num = "148|1|0|256"; print $num; if ($searchnum =~ /$num/) { print "found"; }... (2 Replies)
Discussion started by: kar_333
2 Replies

4. Shell Programming and Scripting

Formatting problem with cat, egrep and perl

Hi guys I'm using the following script to change input file format to another format. some where I'm getting the error. Could you please let me know if you find out? cat input.txt|egrep -v ‘^#'|\ perl -ane ‘if (@F>3){$_=~/(chr.+):(\d+)\ s()/;print $1,”\t”,$2,”\t”,($2+35),”\n”}'\ > output.bed ... (1 Reply)
Discussion started by: repinementer
1 Replies

5. Shell Programming and Scripting

running egrep in perl script ?

Hi there if i run this from the BASH command line, i get a good result # FS="my-box23/account" # zfs list -t filesystem -H | cut -f1 |egrep "^ZPpool1/$FS$" ZP0pool1/my-box23/account which is great, however if I try to run in a perl script populating an array with the result/s, i get... (4 Replies)
Discussion started by: rethink
4 Replies

6. Shell Programming and Scripting

awk with really big files

Hi, I have a text file that is around 7Gb which is basically a matrix of numbers (FS is a space and RS is \n). I need the most efficient way of plucking out a number from a specified row and column in the file. For example, for the value at row 15983, col 26332, I'm currently I'm using: ... (1 Reply)
Discussion started by: Jonny2Vests
1 Replies

7. UNIX for Dummies Questions & Answers

Quick egrep / awk help, Please

Ok, this may be very simple but I can't find a solution. I have a list of numbered values which I have grepped from a larger life. ex/ 1:7.54 2:4.52 3:3.22 4:2.11 5:3.59 6:4.36 7:6.88 8:12.28 9:13.37 10:15.6 11:17.66 12:14.25 I need a quick way to organize them (using awk?)... (4 Replies)
Discussion started by: jdolny
4 Replies

8. UNIX for Dummies Questions & Answers

grep/awk/egrep?

Hi, The input file "notifications" contains the following string. FRTP has 149 missing batches I want to search for : FRTP has missing batches As the number 149 is not important and will change. The commands I have tried. grep "FRTP has.*missing batches" notifications.txt... (3 Replies)
Discussion started by: whugo
3 Replies

9. UNIX for Dummies Questions & Answers

PERL & KSH Big Question

Hi All, Anyhelp on the following is highly appreciated I have a flat file which contains entrys like this L1 I1 B1 R1 L2 I2,I3 B1 R2 L3 I1 x R3 L4 x B2 R1 L5 I2 B1 R4 x means no entry Now after reading the... (3 Replies)
Discussion started by: jingi1234
3 Replies

10. Shell Programming and Scripting

awk not working as expected with BIG files ...

I am facing some strange problem. I know, there is only one record in a file 'test.txt' which starts with 'X' I ensure that with following command, awk /^X/ test.txt | wc -l This gives me output = '1'. Now I take out this record out of the file, as follows : awk /^X/ test.txt >... (1 Reply)
Discussion started by: videsh77
1 Replies
Login or Register to Ask a Question