remove unwanted text using perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting remove unwanted text using perl
# 1  
Old 03-21-2011
remove unwanted text using perl

Hello..I have a text file that need to remove unwanted text. This is the original file.
Code:
No.     Time        Source                Destination           Protocol Info
     16 0.649949    10.1.1.101            209.225.11.237        HTTP     POST /scripts/cms/xcms.asp HTTP/1.1  (application/vnd.xacp)

Frame (487 bytes):

0000  00 05 5d 6f d7 c1 00 04 e2 22 5a 03 08 00 45 00   ..]o....."Z...E.
0010  01 d9 b3 12 40 00 80 06 5c d8 0a 01 01 65 d1 e1   ....@...\....e..
0020  0b ed 0c 6b 00 50 34 9d 5e ed 00 00 1f 7e 50 18   ...k.P4.^....~P.
0030  ff ff 1b b7 00 00 3c 3f 78 6d 6c 20 76 65 72 73   ......<?xml vers
0040  69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69   ion="1.0" encodi


Reassembled TCP (993 bytes):

0000  50 4f 53 54 20 2f 73 63 72 69 70 74 73 2f 63 6d   POST /scripts/cm
0010  73 2f 78 63 6d 73 2e 61 73 70 20 48 54 54 50 2f   s/xcms.asp HTTP/
0020  31 2e 31 0d 0a 55 73 65 72 2d 41 67 65 6e 74 3a   1.1..User-Agent:
0030  20 4d 6f 7a 69 6c 6c 61 2f 34 2e 30 20 28 63 6f    Mozilla/4.0 (co
0040  6d 70 61 74 69 62 6c 65 3b 20 4d 53 49 45 20 36   mpatible; MSIE 6
0050  2e 30 3b 20 57 69 6e 64 6f 77 73 20 4e 54 20 35   .0; Windows NT 5

No.     Time        Source                Destination           Protocol Info
     38 1.292367    10.1.1.1              10.1.1.101            HTTP     HTTP/1.1 200 OK  (text/html)

Frame (275 bytes):

0000  00 04 e2 22 5a 03 00 c0 df 20 6c df 08 00 45 00   ..."Z.... l...E.
0010  01 05 a1 4c 40 00 40 06 82 3f 0a 01 01 01 0a 01   ...L@.@..?......
0020  01 65 00 50 0c 74 37 a1 f7 23 34 a5 0b 1a 50 19   .e.P.t7..#4...P.
0030  1a e8 9b 34 00 00 2d 30 37 2d 4e 65 6c 73 6f 6e   ...4..-07-Nelson
0040  73 42 61 79 2f 69 6e 64 65 78 2e 68 74 6d 6c 22   sBay/index.html"

Reassembled TCP (4601 bytes):

0000  48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d   HTTP/1.1 200 OK.
0010  0a 44 61 74 65 3a 20 53 61 74 2c 20 32 30 20 4e   .Date: Sat, 20 N
0020  6f 76 20 32 30 30 34 20 31 30 3a 32 31 3a 30 37   ov 2004 10:21:07
0030  20 47 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70    GMT..Server: Ap
0040  61 63 68 65 2f 32 2e 30 2e 34 30 20 28 52 65 64   ache/2.0.40 (Red
0050  20 48 61 74 20 4c 69 6e 75 78 29 0d 0a 4c 61 73    Hat Linux)..Las

Then I want to remove unwanted text after line No. until line Frame and text after reassemble. But text from hex(frame) until line No. will be combine. Reassemble will be included in hex file for separate hex (frame) and hex(reassemble). Here is sample output of the file:
Code:
No.     Time        Source                Destination           Protocol Info
00055d6fd7c10004e2225a030800450001d9b312400080065cd80a010165d1e10bed0c6b0050349
d5eed00001f7e5018ffff1bb700003c3f786d6c2076657273696f6e3d22312e302220656e636f64
69Reassembled504f5354202f736372697074732f636d732f78636d732e61737020485454502f31
2e310d0a557365722d4167656e743a204d6f7a696c6c612f342e302028636f6d70617469626c653
b204d53494520362e303b2057696e646f7773204e542035
No.     Time        Source                Destination           Protocol Info
0004e2225a0300c0df206cdf080045000105a14c40004006823f0a0101010a01016500500c7437a
1f72334a50b1a50191ae89b3400002d30372d4e656c736f6e734261792f696e6465782e68746d6c
223e4e656c736f6e732042617920323030322d31322d30373c2f613e3c62723e0d0a3c612068726
5663d222e2f32303031Reassembled485454502f312e3120323030204f4b0d0a446174653a20536
1742c203230204e6f7620323030342031303a32313a303720474d540d0a5365727665723a204170
616368652f322e302e3430202852656420486174204c696e7578290d0a4c6173

This is few command that used in perl script:
Code:
$row=~s/\s\s\s.*//g;  # remove ascii. remove anything after find 3 space
$row=~s/.*\s\s//g;   # remove address. remove anything before 2 space
$row=~s/\n/\ /g;   # replace pattern string No. Time...bla..bla
$row=~s/\s//g;  # remove space between string & combine into one line

Thank you...
# 2  
Old 03-21-2011
Does awk work for you?

Code:
awk -v FS="  " '/^No/ {print "\n"$0"\n"} /^00/{ print gensub(" ","","g",$2)}END{print ""}' file

# 3  
Old 03-21-2011
Here's one way to do it with Perl -

Code:
$
$ cat input
No.     Time        Source                Destination           Protocol Info
     16 0.649949    10.1.1.101            209.225.11.237        HTTP     POST /scripts/cms/xcms.asp HTTP/1.1  (application/vnd.xacp)
Frame (487 bytes):
0000  00 05 5d 6f d7 c1 00 04 e2 22 5a 03 08 00 45 00   ..]o....."Z...E.
0010  01 d9 b3 12 40 00 80 06 5c d8 0a 01 01 65 d1 e1   ....@...\....e..
0020  0b ed 0c 6b 00 50 34 9d 5e ed 00 00 1f 7e 50 18   ...k.P4.^....~P.
0030  ff ff 1b b7 00 00 3c 3f 78 6d 6c 20 76 65 72 73   ......<?xml vers
0040  69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69   ion="1.0" encodi

Reassembled TCP (993 bytes):
0000  50 4f 53 54 20 2f 73 63 72 69 70 74 73 2f 63 6d   POST /scripts/cm
0010  73 2f 78 63 6d 73 2e 61 73 70 20 48 54 54 50 2f   s/xcms.asp HTTP/
0020  31 2e 31 0d 0a 55 73 65 72 2d 41 67 65 6e 74 3a   1.1..User-Agent:
0030  20 4d 6f 7a 69 6c 6c 61 2f 34 2e 30 20 28 63 6f    Mozilla/4.0 (co
0040  6d 70 61 74 69 62 6c 65 3b 20 4d 53 49 45 20 36   mpatible; MSIE 6
0050  2e 30 3b 20 57 69 6e 64 6f 77 73 20 4e 54 20 35   .0; Windows NT 5
No.     Time        Source                Destination           Protocol Info
     38 1.292367    10.1.1.1              10.1.1.101            HTTP     HTTP/1.1 200 OK  (text/html)
Frame (275 bytes):
0000  00 04 e2 22 5a 03 00 c0 df 20 6c df 08 00 45 00   ..."Z.... l...E.
0010  01 05 a1 4c 40 00 40 06 82 3f 0a 01 01 01 0a 01   ...L@.@..?......
0020  01 65 00 50 0c 74 37 a1 f7 23 34 a5 0b 1a 50 19   .e.P.t7..#4...P.
0030  1a e8 9b 34 00 00 2d 30 37 2d 4e 65 6c 73 6f 6e   ...4..-07-Nelson
0040  73 42 61 79 2f 69 6e 64 65 78 2e 68 74 6d 6c 22   sBay/index.html"
Reassembled TCP (4601 bytes):
0000  48 54 54 50 2f 31 2e 31 20 32 30 30 20 4f 4b 0d   HTTP/1.1 200 OK.
0010  0a 44 61 74 65 3a 20 53 61 74 2c 20 32 30 20 4e   .Date: Sat, 20 N
0020  6f 76 20 32 30 30 34 20 31 30 3a 32 31 3a 30 37   ov 2004 10:21:07
0030  20 47 4d 54 0d 0a 53 65 72 76 65 72 3a 20 41 70    GMT..Server: Ap
0040  61 63 68 65 2f 32 2e 30 2e 34 30 20 28 52 65 64   ache/2.0.40 (Red
0050  20 48 61 74 20 4c 69 6e 75 78 29 0d 0a 4c 61 73    Hat Linux)..Las
$
$
$ perl -lne 'if (/^No./ or eof) {if ($y) {@a= unpack("(A79)*", $y); print for (@a); $y=""} print}
             elsif (/^\d+\s+(.*?) \s+.*$/ or /^(Reassembled).*$/) {$x=$1; $x=~s/ //g; $y.=$x}
            ' input
No.     Time        Source                Destination           Protocol Info
00055d6fd7c10004e2225a030800450001d9b312400080065cd80a010165d1e10bed0c6b0050349
d5eed00001f7e5018ffff1bb700003c3f786d6c2076657273696f6e3d22312e302220656e636f64
69Reassembled504f5354202f736372697074732f636d732f78636d732e61737020485454502f31
2e310d0a557365722d4167656e743a204d6f7a696c6c612f342e302028636f6d70617469626c653
b204d53494520362e303b2057696e646f7773204e542035
No.     Time        Source                Destination           Protocol Info
0004e2225a0300c0df206cdf080045000105a14c40004006823f0a0101010a01016500500c7437a
1f72334a50b1a50191ae89b3400002d30372d4e656c736f6e734261792f696e6465782e68746d6c
22Reassembled485454502f312e3120323030204f4b0d0a446174653a205361742c203230204e6f
7620323030342031303a32313a303720474d540d0a5365727665723a204170616368652f322e302
e3430202852656420486174204c696e7578290d0a4c6173
$
$

tyler_durden
# 4  
Old 03-21-2011
Quote:
Originally Posted by yinyuemi
Does awk work for you?

Code:
awk -v FS="  " '/^No/ {print "\n"$0"\n"} /^00/{ print gensub(" ","","g",$2)}END{print ""}' file

thank you for reply..
when I try your awk command, have error.

Code:
awk: line 2: function gensub never defined

---------- Post updated at 02:32 AM ---------- Previous update was at 02:23 AM ----------

Quote:
Originally Posted by durden_tyler
Here's one way to do it with Perl -

Code:
perl -lne 'if (/^No./ or eof) {if ($y) {@a= unpack("(A79)*", $y); print for (@a); $y=""} print}
             elsif (/^\d+\s+(.*?) \s+.*$/ or /^(Reassembled).*$/) {$x=$1; $x=~s/ //g; $y.=$x}
            ' input

tyler_durden
thank you tyler_durden,

your script is much better. But how to modified to all hex output become one line only without any new line..
# 5  
Old 03-21-2011
try this:
Code:
awk -v FS="  " '/^No/ {print "\n"$0"\n"} /^00/{gsub(" ","",$2);printf $2}END{print ""}' file

# 6  
Old 03-21-2011
Quote:
Originally Posted by yinyuemi
try this:
Code:
awk -v FS="  " '/^No/ {print "\n"$0"\n"} /^00/{gsub(" ","",$2);printf $2}END{print ""}' file

thanks yinyuemi,

this awk, don't have insert "reassembled" at the middle of hex data between frame and ressemble. It's combine all hex.
# 7  
Old 03-21-2011
Yes, I miss that, please try this:
Code:
awk -v FS="  " '/^No/ {print "\n"$0"\n"} /^00/{gsub(" ","",$2);printf $2}/Reassembled TCP/{printf "Reassembled"} END{print ""}'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove unwanted commas from a .csv file?

how to remove unwanted commas from a .csv file Input file format "Server1","server-PRI-Windows","PRI-VC01","Microsoft Windows Server 2012, (64-bit)","Powered On","1,696.12","server-GEN-SFCHT2-VMS-R013,server-GEN-SFCHT2-VMS-R031,server-GEN-SFCHT2-VMS-R023"... (5 Replies)
Discussion started by: ranjancom2000
5 Replies

2. Shell Programming and Scripting

Remove unwanted white space

Hi, I have a very big file 25GB with information present in it like $ head ind_stats update index statistics pfirm001.dbo.Office using 200 values go ... (11 Replies)
Discussion started by: sam05121988
11 Replies

3. Shell Programming and Scripting

How to remove unwanted " from string...

I have this Input File with extra double quotes in the middle. Please suggest how to handle this condition. Input File: "123985","SAW CUT CONCRETE SLAB 20"THICK",,"98.57","","EACH","N" "204312","ARMAFLEX-1 3/8 X 3"",,"2.48","","PER FOOT","N" "205745","MISTING HEAD HOLLOW CONE "C"... (3 Replies)
Discussion started by: BICC
3 Replies

4. Shell Programming and Scripting

How to remove unwanted strings?

Hi Guys, Can someone give me a hand on how I can remove unwanted strings like "<Number>" and "</Number>" and retain only the numbers from the input file below. INPUT FILE: <Number>10050000</Number> <Number>1001340001</Number> <Number>1001750002</Number> <Number>100750003</Number>... (8 Replies)
Discussion started by: pinpe
8 Replies

5. Shell Programming and Scripting

Remove unwanted lines

I have a .xml file, where i need some output. The xml file is like: Code: <?******?></ddddd><sssss>234</dfdffsdf><sdhjh>534</dfdfa>......... /Code I need the output like: code 234 534 . . . /code How can i do it? (5 Replies)
Discussion started by: anupdas
5 Replies

6. Emergency UNIX and Linux Support

Remove Unwanted Libraries - optimizing

We have a huge makefile composing of inclusion of libraries, objects and system libraries to generate a binary. How do we find out that which of the libraries we can remove in the most efficient way? Doing hit and trial method is a waste of time and can during the linking with some post linking... (12 Replies)
Discussion started by: uunniixx
12 Replies

7. Solaris

Remove unwanted packages

I got a system which was installed with SUNWCXall cluster installed on it and i want remove unwanted software like GMNOME, Java Desktop System, Staroffice and numerous other softwares .. i want to do an automated removal of these packages where its uninstalled by itself ..from the is there any... (4 Replies)
Discussion started by: fugitive
4 Replies

8. UNIX for Advanced & Expert Users

How to Remove the unwanted Blank Lines

I have a file with the below data, i would like to remove the end blank lines with no data. I used the below commands but could not able to succeed, could you please shed some light. Commands Used: sed '/^$/d' input.txt > output.txt grep -v '^$' input.txt > output.txt input.txt file... (5 Replies)
Discussion started by: Ariean
5 Replies

9. Shell Programming and Scripting

Remove unwanted XML Tags

I have set of sources and the respective resolution. Please advice how to resolve the same using Unix shell scripting. Source 1: ======= <ext:ContactInfo xmlns:ext="urn:AOL.FLOWS.Extensions"> <ext:InternetEmailAddress>AOL@AOL.COM</ext:InternetEmailAddress> </ext:ContactInfo> Resoultion... (1 Reply)
Discussion started by: ambals123
1 Replies

10. Shell Programming and Scripting

Remove unwanted data?

Hi Can any one help me remove the unwanted data? I would want to remove the complete event id 4910 ( the type there is INFO), that means, I have to remove starting from 7th - 19th lines. can any one of you please help? Thanks, (24 Replies)
Discussion started by: hitmansilentass
24 Replies
Login or Register to Ask a Question