remove duplicate lines using awk | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

remove duplicate lines using awk

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 01-31-2011
sudvishw sudvishw is offline
Registered User
 
Join Date: Nov 2009
Last Activity: 28 February 2012, 6:36 AM EST
Posts: 24
Thanks: 1
Thanked 0 Times in 0 Posts
remove duplicate lines using awk

Hi,
I came to know that using
Code:
awk '!x[$0]++'

removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates.

Thanks in advance,
sudvishw
Sponsored Links
    #2  
Old 01-31-2011
homeboy's Avatar
homeboy homeboy is offline
Registered User
 
Join Date: Oct 2009
Last Activity: 21 June 2013, 3:39 AM EDT
Posts: 129
Thanks: 27
Thanked 11 Times in 11 Posts
x is a array and it's initialized to 0.the index of x is $0,if $0 is first time meet,then plus 1 to the value of x[$0],x[$0] now is 1.As ++ here is "suffix ++",0 is returned and then be added.So !x[$0] is true,the $0 is printed by default.if $0 appears more than once,! x[$0] will be false so won't print $0.
Sponsored Links
    #3  
Old 01-31-2011
sudvishw sudvishw is offline
Registered User
 
Join Date: Nov 2009
Last Activity: 28 February 2012, 6:36 AM EST
Posts: 24
Thanks: 1
Thanked 0 Times in 0 Posts
I am sorry. I cannot understand. It would be great if you can explain with an example. Usually we do a sort and then pick the unique records. Is there any sorting inbuilt in this awk
    #4  
Old 01-31-2011
Scrutinizer's Avatar
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 21 October 2014, 1:48 PM EDT
Location: Amsterdam
Posts: 9,535
Thanks: 284
Thanked 2,419 Times in 2,169 Posts
Sorting is not necessary. All it does is create an (associative) array element with the entire line as the index without a value (or 0 is you will). The exclamation mark negates that value so the outcome is 1 (true). The value of 1 in awk means perform the default action which is {print $0} so the entire line gets printed.

Afterwards the ++ comes into action and 1 is added to the array value, which now becomes 1. So that next time the same line is encountered the value returned by the array is 1 which is then negated to 0 by the exclamation mark, so nothing will get printed.
Sponsored Links
    #5  
Old 01-31-2011
sudvishw sudvishw is offline
Registered User
 
Join Date: Nov 2009
Last Activity: 28 February 2012, 6:36 AM EST
Posts: 24
Thanks: 1
Thanked 0 Times in 0 Posts
Hi,
Thanks for your explanation. If I understand it right, suppose we have a file as shown below:

Code:
hi
hi
hii
hi

Here hi comes 3 times.

First time when hi comes, x[hi] will be initialized to 0, which is negated and so it becomes 1 and the line is printed. Second time, x[hi] will be 1, which gets negated and so it becomes 0 and the line is not printed.

If I am not wrong, third time, x[hi] will be 0, which gets negated to 1 and the line should be printed. I think there is something I am missing here. Please clarify.
Sponsored Links
    #6  
Old 01-31-2011
Scrutinizer's Avatar
Scrutinizer Scrutinizer is offline Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 21 October 2014, 1:48 PM EDT
Location: Amsterdam
Posts: 9,535
Thanks: 284
Thanked 2,419 Times in 2,169 Posts
The fourth time, x[hi] will be 2, which gets negated and so it becomes 0.

valoldvalnegatenewval
hi011
hi102
hii011
hi203

Last edited by Scrutinizer; 01-31-2011 at 04:17 AM..
The Following User Says Thank You to Scrutinizer For This Useful Post:
sudvishw (01-31-2011)
Sponsored Links
    #7  
Old 01-31-2011
sudvishw sudvishw is offline
Registered User
 
Join Date: Nov 2009
Last Activity: 28 February 2012, 6:36 AM EST
Posts: 24
Thanks: 1
Thanked 0 Times in 0 Posts
Got it. Thanks!!!
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Remove duplicate lines in log files karthikn7974 Shell Programming and Scripting 4 03-21-2009 06:41 PM
Compare and Remove duplicate lines from txt rmarcano UNIX for Dummies Questions & Answers 11 08-18-2008 05:09 AM
Remove Duplicate lines from File Nysif Steve UNIX for Dummies Questions & Answers 18 09-09-2007 08:57 AM
how to remove duplicate lines fredao Shell Programming and Scripting 3 12-13-2006 11:51 AM
Remove Duplicate Lines in File Teh Tiack Ein Shell Programming and Scripting 5 01-12-2006 07:30 AM



All times are GMT -4. The time now is 05:36 PM.