Awk: Remove Duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk: Remove Duplicates
# 1  
Old 01-23-2014
Awk: Remove Duplicates

I have the following code for removing duplicate records based on fields in inputfile file & moves the duplicate records in duplicates file(1st Awk) & in 2nd awk i fetch the non duplicate entries in inputfile to tmp file and use move to update the original file.

Requirement:
Can both the awk be combined in single call? or is there any efficient way to do the same?

Code:
awk -F, 'dupentries[$1,$2,$3,$4,$5,$6,$7,$8]++' inputfile >> Duplicates
awk -F, '!dupentries[$1,$2,$3,$4,$5,$6,$7,$8]++' inputfile > inputfile.tmp
mv inputfile.tmp inputfile

# 2  
Old 01-23-2014
Try:
Code:
awk -F, 'dupentries[$1,$2,$3,$4,$5,$6,$7,$8]++ {print > "Duplicates"; next};print' inputfile > inputfile.tmp
mv inputfile.tmp inputfile

This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 01-23-2014
Hi Don,
it is giving following error at "print"
Code:
awk: dupentries[$1,$2,$3,$4,$5,$6,$7,$8]++ {print > "Duplicates"; next}; print
awk:                                                                     ^ syntax error

inputfile
Code:
24253886,1,9137,179274,20140111000049,1,N,,0,928678,67340,C2506Qkz,533,SSCHHA01S201401110005000000.PDSN,0,MB
24253886,1,9137,179274,20140111000049,0,N,,0,0,0,C2506Qkz,336,SSCHHA01S201401110005000000.PDSN,0,MB
24253886,1,9137,179274,20140111000049,0,N,,0,0,0,C2506Qkz,335,SSCHHA01S201401110005000000.PDSN,0,MB
24253886,1,9137,179274,20140111000049,1,N,,0,5589,7171,C2506Qkz,534,SSCHHA01S201401110005000000.PDSN,0,MB
24253886,1,9137,179274,20140111000049,0,N,,0,0,0,C2506Qkz,338,SSCHHA01S201401110005000000.PDSN,0,MB
24253886,1,9137,179274,20140111000049,0,N,,0,0,0,C2506Qkz,334,SSCHHA01S201401110005000000.PDSN,0,MB
4000050706,1,9137,275541,20140111000411,10,N,,0,8246472,1791142,C2706RXa,533,SSCHHA01S201401110005000000.PDSN,0,MB
4000050706,1,9137,275541,20140111000411,1,N,,0,344071,105732,C2706RXa,534,SSCHHA01S201401110005000000.PDSN,0,MB
4000050706,1,9137,275541,20140111001259,10,N,,0,6171716,4289817,C2706RZV,533,SSCHHA01S201401110015000002.PDSN,0,MB
4000050706,1,9137,275541,20140111001259,1,N,,0,17662,9883,C2706RZV,534,SSCHHA01S201401110015000002.PDSN,0,MB

# 4  
Old 01-23-2014
Hello,

Just add {print} in place of print.
It should work then.


Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 01-23-2014
Should be:
Code:
awk -F, 'dupentries[$1,$2,$3,$4,$5,$6,$7,$8]++ {print > "Duplicates"; next}{print}' inputfile > inputfile.tmp

These 2 Users Gave Thanks to Franklin52 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - Remove duplicates during array build

Greetings Experts, Issue: Within awk script, remove the duplicate occurrences that are space (1 single space character) separated Description: I am processing 2 files using awk and during processing, I am building an array and there are duplicates on this; how can I delete the duplicates... (3 Replies)
Discussion started by: chill3chee
3 Replies

2. Shell Programming and Scripting

Remove duplicates

Hi I have a below file structure. 200,1245,E1,1,E1,,7611068,KWH,30, ,,,,,,,, 200,1245,E1,1,E1,,7611070,KWH,30, ,,,,,,,, 300,20140223,0.001,0.001,0.001,0.001,0.001 300,20140224,0.001,0.001,0.001,0.001,0.001 300,20140225,0.001,0.001,0.001,0.001,0.001 300,20140226,0.001,0.001,0.001,0.001,0.001... (1 Reply)
Discussion started by: tejashavele
1 Replies

3. Shell Programming and Scripting

Sort and Remove duplicates

Here is my task : I need to sort two input files and remove duplicates in the output files : Sort by 13 characters from 97 Ascending Sort by 1 characters from 96 Ascending If duplicates are found retain the first value in the file the input files are variable length, convert... (4 Replies)
Discussion started by: ysvsr1
4 Replies

4. Shell Programming and Scripting

Remove top 3 duplicates

hello , I have a requirement with input in below format abc 123 xyz bcd 365 kii abc 987 876 cdf 987 uii abc 456 yuu bcd 654 rrr Expecting Output abc 456 yuu bcd 654 rrr cdf 987 uii (1 Reply)
Discussion started by: Tomlight
1 Replies

5. Shell Programming and Scripting

Remove duplicates

I have a file with the following format: fields seperated by "|" title1|something class|long...content1|keys title2|somhing class|log...content1|kes title1|sothing class|lon...content1|kes title3|shing cls|log...content1|ks I want to remove all duplicates with the same "title field"(the... (3 Replies)
Discussion started by: dtdt
3 Replies

6. Shell Programming and Scripting

awk remove first duplicates

Hi All, I have searched many threads for possible close solution. But I was unable to get simlar scenario. I would like to print all duplicate based on 3rd column except the first occurance. Also would like to print if it is single entry(non-duplicate). i/P file 12 NIL ABD LON 11 NIL ABC... (6 Replies)
Discussion started by: sybadm
6 Replies

7. Shell Programming and Scripting

bash - remove duplicates

I need to use a bash script to remove duplicate files from a download list, but I cannot use uniq because the urls are different. I need to go from this: http://***/fae78fe/file1.wmv http://***/39du7si/file1.wmv http://***/d8el2hd/file2.wmv http://***/h893js3/file2.wmv to this: ... (2 Replies)
Discussion started by: locoroco
2 Replies

8. Shell Programming and Scripting

remove duplicates and sort

Hi, I'm using the below command to sort and remove duplicates in a file. But, i need to make this applied to the same file instead of directing it to another. Thanks (6 Replies)
Discussion started by: dvah
6 Replies

9. Shell Programming and Scripting

Remove duplicates

Hello Experts, I have two files named old and new. Below are my example files. I need to compare and print the records that only exist in my new file. I tried the below awk script, this script works perfectly well if the records have exact match, the issue I have is my old file has got extra... (4 Replies)
Discussion started by: forumthreads
4 Replies

10. UNIX for Dummies Questions & Answers

How to remove duplicates without sorting

Hello, I can remove duplicate entries in a file by: sort File1 | uniq > File2 but how can I remove duplicates without sorting the file? I tried cat File1 | uniq > File2 but it doesn't work thanks (4 Replies)
Discussion started by: orahi001
4 Replies
Login or Register to Ask a Question