|
|||||||||
| Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here. |
unix and linux operating commands |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
||||
|
||||
|
remove duplicate lines using awk
Hi, I came to know that using Code:
awk '!x[$0]++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw
|
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
x is a array and it's initialized to 0.the index of x is $0,if $0 is first time meet,then plus 1 to the value of x[$0],x[$0] now is 1.As ++ here is "suffix ++",0 is returned and then be added.So !x[$0] is true,the $0 is printed by default.if $0 appears more than once,! x[$0] will be false so won't print $0.
|
| Sponsored Links | ||
|
|
#3
|
||||
|
||||
|
I am sorry. I cannot understand. It would be great if you can explain with an example. Usually we do a sort and then pick the unique records. Is there any sorting inbuilt in this awk
|
|
#4
|
||||
|
||||
|
Sorting is not necessary. All it does is create an (associative) array element with the entire line as the index without a value (or 0 is you will). The exclamation mark negates that value so the outcome is 1 (true). The value of 1 in awk means perform the default action which is {print $0} so the entire line gets printed.
Afterwards the ++ comes into action and 1 is added to the array value, which now becomes 1. So that next time the same line is encountered the value returned by the array is 1 which is then negated to 0 by the exclamation mark, so nothing will get printed. |
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
Hi, Thanks for your explanation. If I understand it right, suppose we have a file as shown below: Code:
hi hi hii hi Here hi comes 3 times. First time when hi comes, x[hi] will be initialized to 0, which is negated and so it becomes 1 and the line is printed. Second time, x[hi] will be 1, which gets negated and so it becomes 0 and the line is not printed. If I am not wrong, third time, x[hi] will be 0, which gets negated to 1 and the line should be printed. I think there is something I am missing here. Please clarify. |
| Sponsored Links | |
|
|
#6
|
||||||||||||||||||||
|
||||||||||||||||||||
|
The fourth time, x[hi] will be 2, which gets negated and so it becomes 0.
Last edited by Scrutinizer; 01-31-2011 at 04:17 AM.. |
| The Following User Says Thank You to Scrutinizer For This Useful Post: | ||
sudvishw (01-31-2011) | ||
| Sponsored Links | |
|
|
#7
|
||||
|
||||
|
Got it. Thanks!!!
|
| Sponsored Links | ||
|
![]() |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Compare and Remove duplicate lines from txt | rmarcano | UNIX for Dummies Questions & Answers | 11 | 08-18-2008 05:09 AM |
| Remove Duplicate lines from File | Nysif Steve | UNIX for Dummies Questions & Answers | 18 | 09-09-2007 08:57 AM |
| how to remove duplicate lines | fredao | Shell Programming and Scripting | 3 | 12-13-2006 11:51 AM |
| Remove Duplicate Lines in File | Teh Tiack Ein | Shell Programming and Scripting | 5 | 01-12-2006 07:30 AM |
|
|