How to delete duplicate entries without using awk command?

12-12-2012

Registered User

16, 0

Join Date: May 2012

Last Activity: 9 April 2013, 9:29 AM EDT

Posts: 16

Thanks Given: 4

Thanked 0 Times in 0 Posts

[Solved] How to delete duplicate entries without using awk command?

Hello..
I am trying to remove the duplicate entries in a log files and used the the below shell script to do the same.

Code:

awk '!x[$0]++' <filename>

Can I do without using the awk command and the regex? I do not want to start the search from the beginning of the line in the log file as it contains date and time which of course varies from line to line. I am not able to write an effective script to do this functionality. Can anyone help me please?

Last edited by Scott; 12-12-2012 at 04:55 AM.. Reason: COde tags

sandeepcm

View Public Profile for sandeepcm

Find all posts by sandeepcm

12-12-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Can you post an (anonimized) sample of the log file?

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

12-12-2012

Registered User

16, 0

Join Date: May 2012

Last Activity: 9 April 2013, 9:29 AM EDT

Posts: 16

Thanks Given: 4

Thanked 0 Times in 0 Posts

Yeah sure. Its something like this

Code:

Jan 13 2010 11:44:55 AM LoggingEx main Info1
Jan 13 2010 11:44:56 AM LoggingEx main Info1
Jan 13 2010 11:44:57 AM LoggingEx main Info1
Jan 13 2010 11:44:58 AM LoggingEx main Info1
Jan 13 2010 11:44:59 AM LoggingEx main Info1

Here not all lines are same. But still they are duplicates. I want to start removing this duplicate lines by beginning my search from 'LoggingEx...' so that i can skip the starting date part. If I use the awk command then it considers all the above lines as different and gives the same output which is not desired. Hope this helps....
Thanks

Last edited by Scrutinizer; 12-12-2012 at 05:16 AM.. Reason: code tags

sandeepcm

View Public Profile for sandeepcm

Find all posts by sandeepcm

12-12-2012

Registered User

3,149, 702

Join Date: Apr 2010

Last Activity: 10 July 2019, 11:33 PM EDT

Posts: 3,149

Thanks Given: 46

Thanked 702 Times in 677 Posts

Code:

$ awk -F"AM" '!a[$2]++' a.txt
Jan 13 2010 11:44:55 AM LoggingEx main Info1

if you are using solaris, then use nawk

The above given command, splits the line using the delimter as "AM", if you want to check for PM also, then

Code:

 
$ nawk -F"AM|PM" '!a[$2]++' a.txt
Jan 13 2010 11:44:55 AM LoggingEx main Info1
Jan 13 2010 11:44:55 PM TEST
Jan 13 2010 11:44:55 PM TEST1

$ cat a.txt 
Jan 13 2010 11:44:55 AM LoggingEx main Info1
Jan 13 2010 11:44:56 AM LoggingEx main Info1
Jan 13 2010 11:44:57 AM LoggingEx main Info1
Jan 13 2010 11:44:58 AM LoggingEx main Info1
Jan 13 2010 11:44:59 AM LoggingEx main Info1
Jan 13 2010 11:44:55 PM TEST
Jan 13 2010 11:44:55 PM TEST
Jan 13 2010 11:44:55 PM TEST1

itkamaraj

View Public Profile for itkamaraj

Find all posts by itkamaraj

12-12-2012

Registered User

1,650, 478

Join Date: Mar 2012

Last Activity: 11 September 2019, 8:06 AM EDT

Posts: 1,650

Thanks Given: 58

Thanked 478 Times in 474 Posts

for PM entries also..

Code:

awk -F "AM|PM" '!a[$2]++' a.txt

pamu

View Public Profile for pamu

Find all posts by pamu

12-12-2012

Registered User

16, 0

Join Date: May 2012

Last Activity: 9 April 2013, 9:29 AM EDT

Posts: 16

Thanks Given: 4

Thanked 0 Times in 0 Posts

Isn't there a way without having dependency on the pattern 'AM/PM'? Also I wanted to know without using awk command?

sandeepcm

View Public Profile for sandeepcm

Find all posts by sandeepcm

12-12-2012

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

sort on some systems can do this:

Code:

sort -uk6 file

This User Gave Thanks to Scrutinizer For This Post:

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

Shell Programming and Scripting

How to delete duplicate entries without using awk command?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to put the command to remove duplicate lines in my awk script?

Discussion started by: Tim2424

2. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Discussion started by: sajmar

3. Shell Programming and Scripting

Delete duplicate row

Discussion started by: aav1307

4. UNIX for Dummies Questions & Answers

Need an awk command to delete a line

Discussion started by: Rahul619

5. Shell Programming and Scripting

Help with removing duplicate entries with awk or Perl

Discussion started by: Amit Pande

6. Shell Programming and Scripting

AWK Command to duplicate lines in a file?

Discussion started by: Grueben

7. Shell Programming and Scripting

How can i delete the content between all the occurences of two strings using sed or awk command

Discussion started by: satheeshkumar

8. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Discussion started by: cola

9. Shell Programming and Scripting

Counting duplicate entries in a file using awk

Discussion started by: sajal.bhatia

10. Ubuntu

delete duplicate rows with awk files

Discussion started by: sashtari