Avoid Duplicates in a file | Unix Linux Forums | UNIX for Dummies Questions & Answers

  Go Back    


UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

Avoid Duplicates in a file

UNIX for Dummies Questions & Answers


Tags
awk, awk trim, trim, trim awk

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 03-24-2008
pssandeep pssandeep is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 25 March 2010, 11:03 AM EDT
Posts: 40
Thanks: 0
Thanked 0 Times in 0 Posts
Avoid Duplicates in a file

Hi Gurus,

I had a question regarding avoiding duplicates.i have a file abc.txt
abc.txt
-------
READER_1_1_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_3_1> Sun Mar 23 23:52:48 2008
READER_1_3_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_2_1> Sun Mar 23 23:52:48 2008
READER_1_2_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_1_1> Sun Mar 23 23:52:48 2008
READER_1_1_1> HIER_28058 XML Reader Error
READER_1_2_1> Sun Mar 23 23:52:48 2008
READER_1_2_1> HIER_28058 XML Reader Error
READER_1_3_1> Sun Mar 23 23:52:48 2008
READER_1_3_1> HIER_28058 XML Reader Error.


I will get this file when i egrep the session log file for error specific messages.

here i want to remove the repeated lines in this file?
i am try to use uniq command, but here READER_1_3_1> will be varying in the above abc.txt file
i.e., in every line we have 1,2,3 in place of READER_1_**_1>.

So i am not getting the desired output.

Can any one let me know how to get the requirement done.

My output should be
READER_1_1_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_1_1> HIER_28058 XML Reader Error



Thanks & Regards,
San
Sponsored Links
    #2  
Old 03-24-2008
sysgate's Avatar
sysgate sysgate is offline Forum Advisor  
Unix based
 
Join Date: Nov 2006
Last Activity: 20 November 2013, 8:29 AM EST
Location: Bulgaria
Posts: 1,419
Thanks: 0
Thanked 6 Times in 6 Posts
use :

Code:
sort file1 | uniq -u > new-file-only-unique-lines

"uniq -q" prints only the unique lines in a file.
Sponsored Links
    #3  
Old 03-24-2008
lazytech lazytech is offline
Registered User
 
Join Date: Nov 2006
Last Activity: 30 December 2013, 11:19 AM EST
Posts: 41
Thanks: 0
Thanked 0 Times in 0 Posts
My suggestion would be to write a Perl script incorporating Regular Expression to solve your issue. It may be easier to pull out your unique values that way.
    #4  
Old 03-24-2008
xonicman xonicman is offline
Registered User
 
Join Date: Mar 2008
Last Activity: 14 August 2008, 1:09 PM EDT
Posts: 7
Thanks: 0
Thanked 0 Times in 0 Posts
My question is: Does every message have 2 duplicates in input data?

You gave abc.txt example, where every messages is duplicated with 'READER_1_2_1>' and 'READER_1_3_1>' headers added. If all logs are presented in such way you can:


Code:
grep 'error message string' abc.txt | grep 'READER_1_1_1>'

I could misunderstand you. It can be too obvious answer for your question. I don't know, if "header" (before '>' sign) changes.
Sponsored Links
    #5  
Old 03-25-2008
pssandeep pssandeep is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 25 March 2010, 11:03 AM EDT
Posts: 40
Thanks: 0
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by sysgate View Post
use :

Code:
sort file1 | uniq -u > new-file-only-unique-lines

"uniq -q" prints only the unique lines in a file.
Thanks for ur reply
But here all the lines are unique since reader_1 will be changed Reader_2.only that variation will be there,

so if i use uniq command it not getting the desired output.
Sponsored Links
    #6  
Old 03-25-2008
pssandeep pssandeep is offline
Registered User
 
Join Date: Dec 2007
Last Activity: 25 March 2010, 11:03 AM EDT
Posts: 40
Thanks: 0
Thanked 0 Times in 0 Posts
Quote:
Originally Posted by xonicman View Post
My question is: Does every message have 2 duplicates in input data?

You gave abc.txt example, where every messages is duplicated with 'READER_1_2_1>' and 'READER_1_3_1>' headers added. If all logs are presented in such way you can:


Code:
grep 'error message string' abc.txt | grep 'READER_1_1_1>'

I could misunderstand you. It can be too obvious answer for your question. I don't know, if "header" (before '>' sign) changes.
Thnaks for ur reply

No since it reads from 3 sources it will be giving out 3 errors.But the source is same so i get 3 similar messages with only reader variation.
Here in reader place i may get any other word .
Sponsored Links
    #7  
Old 03-25-2008
ShawnMilo ShawnMilo is offline
Registered User
 
Join Date: Jun 2006
Last Activity: 10 November 2009, 8:27 AM EST
Posts: 252
Thanks: 0
Thanked 1 Time in 1 Post
All you need to trim a file down to unique lines is:


Code:
sort -u file

However, your lines aren't straight duplicates, so you'll need to pre-process the file. If your lines will all be the same except for the initial "reader" part, then just strip out any occurrence of READER_X_X_X.

ShawnMilo
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Duplicates in an XML file TasosARISFC Shell Programming and Scripting 5 09-12-2011 03:51 PM
CSV file:Find duplicates, save original and duplicate records in a new file arvindosu UNIX for Dummies Questions & Answers 8 07-05-2011 04:48 PM
avoid open file to check field. jimmy_y Shell Programming and Scripting 4 10-30-2009 05:58 AM
Parameter to avoid file being deleted by SAM instant000 HP-UX 5 09-07-2009 01:53 PM
How to avoid a temp file fox1212 Shell Programming and Scripting 3 10-24-2008 09:32 AM



All times are GMT -4. The time now is 06:07 AM.