awk remove duplicate code


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk remove duplicate code
# 1  
Old 01-25-2012
awk remove duplicate code

Hi,

In a previous, now closed thread, I found the following awk script:
Code:
awk '{t[$1" "$2" "$3" "$4]=$5" "$6" "$7}END{for (i in t){print i,t[i]}}'

This code does a great job of removing duplicates by the the first four fields from a 7-field set of columns. I would very very much like to understand how this code works, but can't find anything in the awk documentation. Could someone explain it please? Is the t[ ]= some special function?

Thanks,
Pawel
# 2  
Old 01-25-2012
The big { } code block without the END gets run for every line. The values of $1, $2, $3, etc are set to the values for the columns read. Putting variables and strings in a row sticks them together into a longer string.

So every line, it sets a value like this in the array:

Code:
T["a b c"]="d e f"

If there's a duplicate line, setting the same value in the array twice doesn't put two elements in the array. The previous contents of T["a b c"], for instance, would just get overwritten given another line starting with a b c

Once all the lines have been read, only then will awk run the END {} block, which goes through each thing in the array and prints them (in no particular order).
The syntax for(X in ARRAY) loops through every element in an array, with X being the array index, and ARRAY[X] being the contents of that index.

Last edited by Corona688; 01-25-2012 at 11:49 AM..
# 3  
Old 01-25-2012
t[] is an associative array. In plainer terms, the index ( t[ index goes in here ] ) can be any characters or groups of fields. A lot of other languages use an integer to reference array elements. awk can use numbers but most times it is character strings
# 4  
Old 01-25-2012
Thanks. That cleared it up!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to put the command to remove duplicate lines in my awk script?

I create a CGI in bash/html. My awk script looks like : echo "<table>" for fn in /var/www/cgi-bin/LPAR_MAP/*; do echo "<td>" echo "<PRE>" awk -F',|;' -v test="$test" ' NR==1 { split(FILENAME ,a,""); } $0 ~ test { if(!header++){ ... (12 Replies)
Discussion started by: Tim2424
12 Replies

2. UNIX for Dummies Questions & Answers

Using awk to remove duplicate line if field is empty

Hi all, I've got a file that has 12 fields. I've merged 2 files and there will be some duplicates in the following: FILE: 1. ABC, 12345, TEST1, BILLING, GV, 20/10/2012, C, 8, 100, AA, TT, 100 2. ABC, 12345, TEST1, BILLING, GV, 20/10/2012, C, 8, 100, AA, TT, (EMPTY) 3. CDC, 54321, TEST3,... (4 Replies)
Discussion started by: tugar
4 Replies

3. Shell Programming and Scripting

Remove duplicate

Hi , I have a pipe seperated file repo.psv where i need to remove duplicates based on the 1st column only. Can anyone help with a Unix script ? Input: 15277105||Common Stick|ESHR||Common Stock|CYRO AB 15277105||Common Stick|ESHR||Common Stock|CYRO AB 16111278||Common Stick|ESHR||Common... (12 Replies)
Discussion started by: samrat dutta
12 Replies

4. UNIX for Dummies Questions & Answers

Remove area code using from awk output

I am back again with one more question, so I have a phonebook file with names, phone#s for example: smith, joe 123-456-7890 using awk I want to display all entries with a specific area code. here 's what i have tried so far: awk '$2~/^123/ {print}' phonebook I can't figure... (1 Reply)
Discussion started by: Nirav4
1 Replies

5. Shell Programming and Scripting

Cant get awk 1liner to remove duplicate lines from Delimited file, get "event not found" error..help

Hi, I am on a Solaris8 machine If someone can help me with adjusting this awk 1 liner (turning it into a real awkscript) to get by this "event not found error" ...or Present Perl solution code that works for Perl5.8 in the csh shell ...that would be great. ****************** ... (3 Replies)
Discussion started by: andy b
3 Replies

6. Shell Programming and Scripting

[uniq + awk?] How to remove duplicate blocks of lines in files?

Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example: input: <string-array name="threeItems"> <item>item1</item> <item>item2</item> <item>item3</item> </string-array> <string-array name="twoItems"> <item>item1</item> <item>item2</item>... (19 Replies)
Discussion started by: raidzero
19 Replies

7. Shell Programming and Scripting

remove duplicate lines using awk

Hi, I came to know that using awk '!x++' removes the duplicate lines. Can anyone please explain the above syntax. I want to understand how the above awk syntax removes the duplicates. Thanks in advance, sudvishw :confused: (7 Replies)
Discussion started by: sudvishw
7 Replies

8. Shell Programming and Scripting

Command to remove duplicate lines with perl,sed,awk

Input: hello hello hello hello monkey donkey hello hello drink dance drink Output should be: hello hello monkey donkey drink dance (9 Replies)
Discussion started by: cola
9 Replies

9. Shell Programming and Scripting

remove duplicate

Hi, I am tryung to use shell or perl to remove duplicate characters for example , if I have " I love google" it will become I love ggle" or even "I loveggle" if removing duplicate white space Thanks CC (6 Replies)
Discussion started by: ccp
6 Replies

10. Shell Programming and Scripting

awk script to remove duplicate rows in line

i have the long file more than one ns and www and mx in the line like . i need the first ns record and first www and first mx from line . the records are seperated with tthe ; i am try ing in awk scripting not getiing the solution. ... (4 Replies)
Discussion started by: kiranmosarla
4 Replies
Login or Register to Ask a Question