Create shell script to extract unique information from one file to a new file.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Create shell script to extract unique information from one file to a new file.
# 1  
Old 08-09-2011
Create shell script to extract unique information from one file to a new file.

Hi to all,

I got this content/pattern from file http.log.20110808.gz

Code:
[07/Aug/2011:07:37:39 +0800] mail1 httpd[14646]: Account Notice: close [192.168.10.128] igchung@abc.com 2011/8/7 7:37:36 0:00:03 0 0 1
[07/Aug/2011:07:37:44 +0800] mail1 httpd[14647]: Account Information: login [192.168.10.131:17187] sastria9@abc.com proxy sid=gFp4DLm5HnU
[07/Aug/2011:07:37:44 +0800] mail1 httpd[14648]: Account Notice: close [192.168.10.131] sastria9@abc.com 2011/8/7 7:37:44 0:00:00 0 0 1
[07/Aug/2011:07:37:45 +0800] mail1 httpd[14647]: Account Information: login [192.168.10.131:17194] sastria9@abc.com proxy sid=gSiaecABc/E
[07/Aug/2011:07:38:37 +0800] mail1 httpd[14646]: Account Information: login [192.168.10.129:2063] pntcdor1@abc.com proxy sid=ZGhAdmqmz3k
[07/Aug/2011:07:38:37 +0800] mail1 httpd[14647]: Account Notice: close [192.168.10.129] pntcdor1@abc.com 2011/8/7 7:38:37 0:00:00 0 0 1
[07/Aug/2011:07:38:38 +0800] mail1 httpd[14646]: Account Information: login [192.168.10.129:2071] pntcdor1@abc.com proxy sid=PtwbGuIk+I4
[07/Aug/2011:07:38:48 +0800] mail1 httpd[14646]: Account Information: login [192.168.10.130:14272] visnet@abc.com proxy sid=4W6xBKPXXvk
[07/Aug/2011:07:38:48 +0800] mail1 httpd[14647]: Account Notice: close [192.168.10.130] visnet@abc.com 2011/8/7 7:38:48 0:00:00 0 0 1
[07/Aug/2011:07:38:48 +0800] mail1 httpd[14646]: Account Information: login [192.168.10.130:14279] visnet@abc.com proxy sid=/qenNd/tps8
[07/Aug/2011:07:38:59 +0800] mail1 httpd[14646]: Account Notice: close [192.168.10.130] visnet@abc.com 2011/8/7 7:38:48 0:00:11 0 0 1
[07/Aug/2011:07:39:06 +0800] mail1 httpd[14647]: Account Information: login [192.168.10.130:14367] animan86@abc.com proxy sid=VdYyCOMtPsQ


how can I generate one new file with content as below, from file above?


Last edited by Scott; 08-09-2011 at 03:14 AM.. Reason: Code tags
# 2  
Old 08-09-2011
With grep/sort/uniq:
Code:
grep -o "[^ ]*@[^ ]*" http.log.20110808.gz | sort | uniq

With awk:
Code:
awk ' /@/ { sub("^.*] ",""); sub(" .*", ""); if(!($0 in E)) print; E[$0]} ' http.log.20110808.gz

Note: if file is gzipped as extension seems to imply you man need to pipe output of gzip -d to these solutions.
This User Gave Thanks to Chubler_XL For This Post:
# 3  
Old 08-09-2011
Hi,

I am using unix solaris 10 for this, is this right?


Code:
[root] grep -o "[^ ]*@[^ ]*" http.log.20110801.gz | sort | uniq >1.out
grep: illegal option -- o
Usage: grep -hblcnsviw pattern file . . .
[root] grep -o "[^ ]*@[^ ]*" http.log.20110801.gz | sort
grep: illegal option -- o
Usage: grep -hblcnsviw pattern file . . .
[root] awk ' /@/ { sub("^.*]  ",""); sub(" .*", ""); if(!($0 in E)) print; E[$0]} '  http.log.20110801.gz
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1


Last edited by Scott; 08-09-2011 at 03:15 AM.. Reason: Code tags
# 4  
Old 08-09-2011
Try to use nawk instead of awk.
This User Gave Thanks to yazu For This Post:
# 5  
Old 08-09-2011
Quote:
Originally Posted by yazu
Try to use nawk instead of awk.
still cant, the result is unreadable (binary)
# 6  
Old 08-09-2011
Quote:
Note: if file is gzipped as extension seems to imply you man need to pipe output of gzip -d to these solutions.
This User Gave Thanks to yazu For This Post:
# 7  
Old 08-09-2011
where should i put the gzip -d?

like this?
Code:
[root|reports.tm.net.my:/data2/mail1/201108] grep -o "[^ ]*@[^ ]*" http.log.20110801.gz | sort | uniq | gzip -d
grep: illegal option -- o
Usage: grep -hblcnsviw pattern file . . .

gzip: stdin: unexpected end of file


[root|reports.tm.net.my:/data2/mail1/201108] awk ' /@/ { sub("^.*] ",""); sub(" .*", ""); if(!($0 in E)) print; E[$0]} ' http.log.20110801.gz | gzip -d
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1
awk: syntax error near line 1
awk: illegal statement near line 1

gzip: stdin: unexpected end of file


Last edited by Mr_47; 08-16-2011 at 05:25 AM.. Reason: Code tags, please...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk script to extract transcript information from gff3 file

I need help to extract transcript information from gff3 file. Here is the input Chr01 JGI gene 82773 86941 . - . ID=Potri.001G000900;Name=Potri.001G000900 Chr01 JGI mRNA 82793 86530 . - . ID=PAC:27047814;Name=Potri.001G000900.1;pacid=27047814;longest=1;Parent=Potri.001G000900... (6 Replies)
Discussion started by: Maduranga
6 Replies

2. UNIX for Beginners Questions & Answers

TCL script to extract the file name and then create two independent list

I am having one problem as stated below Problem Description I am having some "sv" extension files , I am using "glob" to extract the matching files , Now in these matching files , I need to split them and extract the elements and create different lists. For example set files This... (1 Reply)
Discussion started by: kshitij
1 Replies

3. Shell Programming and Scripting

How to create file and file content based existing information?

Hi Gurus, I am SQL developer and new unix user. I need to create some file and file content based on information in two files. I have one file contains basic information below file1 and another exception file file2. the rule is if "zone' and "cd" in file1 exists in file2, then file name is... (13 Replies)
Discussion started by: Torhong
13 Replies

4. Shell Programming and Scripting

Generate 10000 unique audio file of 2MB each using shell script.

Hi, I want 10000+ unique Audio file of approx 2MB each. How can i generate numerous audio files using shell script. Any tool, command or suggestions are welcome. If i give one audio seed file then can we create numerous unique files with same seed file? Any help is highly appreciable.... (11 Replies)
Discussion started by: sushil.kumar
11 Replies

5. Shell Programming and Scripting

Help with shell script to extract certain information

Hi, I have a file which I need to programmatically split into two files. All the information in the file before pattern "STOP HERE" is to be stripped and output into one file while everything after "STOP HERE" is to be output into a separate file. I would appreciate help on how to do... (8 Replies)
Discussion started by: PTL
8 Replies

6. Shell Programming and Scripting

Shell Script to Dynamically Extract file content based on Parameters from a pdf file

Hi Guru's, I am new to shell scripting. I have a unique requirement: The system generates a single pdf(/tmp/ABC.pdf) file with Invoices for Multiple Customers, the format is something like this: Page1 >> Customer 1 >>Invoice1 + invoice 2 >> Page1 end Page2 >> Customer 2 >>Invoice 3 + Invoice 4... (3 Replies)
Discussion started by: DIps
3 Replies

7. Shell Programming and Scripting

Extract UNIque records from File

Hi, I have a file with 20GB Pipe Delimited file where i have too many duplicate records. I need an awk script to extract the unique records from the file and put it into another file. Kindly help. Thanks, Arun (1 Reply)
Discussion started by: Arun Mishra
1 Replies

8. Shell Programming and Scripting

shell script to sort information in one file

Hi to all, anyway to create shell script to sort informations from one file and create new file with the sorted values? from file 30days.out -bash-3.00# more 30days.out user/str4@kl.com/INBOX user/tg1@johor.com/INBOX user/tg2@kedah.com/INBOX user/tg3@titangroup.com/INBOX... (3 Replies)
Discussion started by: Mr_47
3 Replies

9. UNIX for Dummies Questions & Answers

Extract Unique Values from file

Hello all, I have a file with following sample data 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26 05:32:01.65 spid5 Process ID 86:214 owns resources that are blocking processes on Scheduler 0. 2009-08-26... (5 Replies)
Discussion started by: simonsimon
5 Replies

10. Shell Programming and Scripting

Urgent: selecting unique specific content of a file using shell script

Hi, I have a file whose content and format at places is as given below. print coloumn .... coloumn .... coloumn .... skip 1 line print coloumn ... skip 1 line I need to select the following : print coloumn .... coloumn .... coloumn... (2 Replies)
Discussion started by: jisha
2 Replies
Login or Register to Ask a Question