The UNIX and Linux Forums  
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.

Go Back   The UNIX and Linux Forums > Top Forums > UNIX for Dummies Questions & Answers
.
google unix.com



UNIX for Dummies Questions & Answers If you're not sure where to post a UNIX or Linux question, post it here. All UNIX and Linux newbies welcome !!

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Removing duplicates in a sorted file by field. kinksville Shell Programming and Scripting 1 04-29-2008 02:03 PM
Remove duplicates from File from specific location gopikgunda Shell Programming and Scripting 1 04-09-2008 02:16 AM
removing duplicates from a file trichyselva UNIX for Dummies Questions & Answers 2 03-25-2008 10:49 AM
Avoid creating temporary files on editing a file in Ubuntu royalibrahim Ubuntu 7 11-17-2007 05:57 AM
Reading Input from File and Duplicates Output noelcantona Shell Programming and Scripting 6 10-18-2005 04:59 AM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 03-24-2008
pssandeep pssandeep is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 34
Avoid Duplicates in a file

Hi Gurus,

I had a question regarding avoiding duplicates.i have a file abc.txt
abc.txt
-------
READER_1_1_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_3_1> Sun Mar 23 23:52:48 2008
READER_1_3_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_2_1> Sun Mar 23 23:52:48 2008
READER_1_2_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_1_1> Sun Mar 23 23:52:48 2008
READER_1_1_1> HIER_28058 XML Reader Error
READER_1_2_1> Sun Mar 23 23:52:48 2008
READER_1_2_1> HIER_28058 XML Reader Error
READER_1_3_1> Sun Mar 23 23:52:48 2008
READER_1_3_1> HIER_28058 XML Reader Error.


I will get this file when i egrep the session log file for error specific messages.

here i want to remove the repeated lines in this file?
i am try to use uniq command, but here READER_1_3_1> will be varying in the above abc.txt file
i.e., in every line we have 1,2,3 in place of READER_1_**_1>.

So i am not getting the desired output.

Can any one let me know how to get the requirement done.

My output should be
READER_1_1_1> HIER_28056 XML Reader: Error [UnterminatedXMLDecl] occurred while parsing:[Error at (file /home/abc.xml, line 6, char 1 ): Invalid document structure.]; line number [6]; column number [1]
READER_1_1_1> HIER_28058 XML Reader Error



Thanks & Regards,
San
  #2 (permalink)  
Old 03-24-2008
sysgate's Avatar
sysgate sysgate is offline Forum Advisor  
Unix based
  
 

Join Date: Nov 2006
Location: Bulgaria
Posts: 1,318
use :
Code:
sort file1 | uniq -u > new-file-only-unique-lines
"uniq -q" prints only the unique lines in a file.
  #3 (permalink)  
Old 03-24-2008
lazytech lazytech is offline
Registered User
  
 

Join Date: Nov 2006
Posts: 39
My suggestion would be to write a Perl script incorporating Regular Expression to solve your issue. It may be easier to pull out your unique values that way.
  #4 (permalink)  
Old 03-24-2008
xonicman xonicman is offline
Registered User
  
 

Join Date: Mar 2008
Posts: 7
My question is: Does every message have 2 duplicates in input data?

You gave abc.txt example, where every messages is duplicated with 'READER_1_2_1>' and 'READER_1_3_1>' headers added. If all logs are presented in such way you can:

Code:
grep 'error message string' abc.txt | grep 'READER_1_1_1>'
I could misunderstand you. It can be too obvious answer for your question. I don't know, if "header" (before '>' sign) changes.
  #5 (permalink)  
Old 03-25-2008
pssandeep pssandeep is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 34
Quote:
Originally Posted by xonicman View Post
My question is: Does every message have 2 duplicates in input data?

You gave abc.txt example, where every messages is duplicated with 'READER_1_2_1>' and 'READER_1_3_1>' headers added. If all logs are presented in such way you can:

Code:
grep 'error message string' abc.txt | grep 'READER_1_1_1>'
I could misunderstand you. It can be too obvious answer for your question. I don't know, if "header" (before '>' sign) changes.
Thnaks for ur reply

No since it reads from 3 sources it will be giving out 3 errors.But the source is same so i get 3 similar messages with only reader variation.
Here in reader place i may get any other word .
  #6 (permalink)  
Old 03-25-2008
ShawnMilo ShawnMilo is offline
Registered User
  
 

Join Date: Jun 2006
Posts: 252
All you need to trim a file down to unique lines is:

Code:
sort -u file
However, your lines aren't straight duplicates, so you'll need to pre-process the file. If your lines will all be the same except for the initial "reader" part, then just strip out any occurrence of READER_X_X_X.

ShawnMilo
  #7 (permalink)  
Old 03-25-2008
pssandeep pssandeep is offline
Registered User
  
 

Join Date: Dec 2007
Posts: 34
Quote:
Originally Posted by sysgate View Post
use :
Code:
sort file1 | uniq -u > new-file-only-unique-lines
"uniq -q" prints only the unique lines in a file.
Thanks for ur reply
But here all the lines are unique since reader_1 will be changed Reader_2.only that variation will be there,

so if i use uniq command it not getting the desired output.
Closed Thread

Bookmarks

Tags
awk, awk trim, trim, trim awk

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 07:43 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0