To select non-duplicate records using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To select non-duplicate records using awk
# 1  
Old 02-24-2014
To select non-duplicate records using awk

Friends,

I have data sorted on id like this

Code:
id addressl
1  abc
2 abc
2 abc
2 abc
3 aabc
4 abc
4 abc

I want to pick all ids with addressesses leaving out duplicate records. Desired output would be

Code:
id address
1 abc
2 abc
3 abc
4 abc

I tried this way

Code:
awk "{ if ($1==id) {next;} else {print;id=$1;)" inputfile

But the results were still having a few duplicates

Any help'll be appreciated


Paresh
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) when tagging multi-line sample code, input, and output.

Last edited by Don Cragun; 02-24-2014 at 05:01 AM.. Reason: Change ICODE tags to CODE tags.
# 2  
Old 02-24-2014
If your data is correctly sorted, then your code will work. Just check if the data is sorted.
# 3  
Old 02-24-2014
Hi guy,
As anbu123 said, if your data is sorted, you code is correct.
Mostly we do this to select non-duplicate records
Code:
awk '!a[$1]++' inputfile

This User Gave Thanks to Lucas_0418 For This Post:
# 4  
Old 02-24-2014
Try:
Code:
sort -un inputfile

This User Gave Thanks to Klashxx For This Post:
# 5  
Old 02-24-2014
Code:
awk '! X[$0]++'

# 6  
Old 02-24-2014
I would suggest using $1 and $2 instead of $0 to discard any blank spaces in any record:
Code:
awk '!A[$1,$2]++' file

This User Gave Thanks to Yoda For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Duplicate records

Gents, Please give a help file --BAD STATUS NOT RESHOOTED-- *** VP 41255/51341 in sw 2973 *** VP 41679/51521 in sw 2973 *** VP 41687/51653 in sw 2973 *** VP 41719/51629 in sw 2976 --BAD COG NOT RESHOOTED-- *** VP 41689/51497 in sw 2974 *** VP 41699/51677 in sw 2974 *** VP... (18 Replies)
Discussion started by: jiam912
18 Replies

2. Shell Programming and Scripting

Select records and fields

Hi All I would like to modify a file like this: >antax gioq21 tris notes abcdefghij klmnopqrs >betax gion32 ter notes2 tuvzabcdef ahgskslsooin this: >tris abcdefghij klmnopqrs >ter tuvzabcdef ahgskslsoo So, I would like to remove the first two fields(and output field 3) in record... (4 Replies)
Discussion started by: giuliangiuseppe
4 Replies

3. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

4. UNIX for Dummies Questions & Answers

Need to keep duplicate records

Consider my input is 10 10 20 then, uniq -u will give 20 and uniq -dwill return 10. But i need the output as , 10 10 How we can achieve this? Thanks (4 Replies)
Discussion started by: pandeesh
4 Replies

5. Shell Programming and Scripting

awk print only select records from file2

Print only records from file 2 that do not match file 1 based on criteria of comparing column 1 and column 6 Was trying to play around with following code I found on other threads but not too successful Code: awk 'NR==FNR{p=$1;$1=x;A=$0;next}{$2=$2(A?A:",,,")}1' FS=~ OFS=~ file1 FS="*"... (11 Replies)
Discussion started by: sigh2010
11 Replies

6. Linux

Need awk script for removing duplicate records

I have log file having Traffic line 2011-05-21 15:11:50.356599 TCP (6), length: 52) 10.10.10.1.3020 > 10.10.10.254.50404: 2011-05-21 15:11:50.652739 TCP (6), length: 52) 10.10.10.254.50404 > 10.10.10.1.3020: 2011-05-21 15:11:50.652558 TCP (6), length: 89) 10.10.10.1.3020 >... (1 Reply)
Discussion started by: Rastamed
1 Replies

7. Shell Programming and Scripting

Block of records to select from a file

Hello: I am new to shell script programming. Now I would like to select specific records block from a file. For example, current file "xyz.txt" is containing 1million records and want to select the block of records from line number 50000 to 100000 and save into a file. Can anyone suggest me how... (3 Replies)
Discussion started by: nvkuriseti
3 Replies

8. UNIX for Dummies Questions & Answers

Getting non-duplicate records

Hi, I have a file with these records abc xyz xyz pqr uvw cde cde In my o/p file , I want all the non duplicate rows to be shown. o/p abc pqr uvw Any suggestions how to do this? Thanks for the help. rs (2 Replies)
Discussion started by: rs123
2 Replies

9. Linux

Need awk script for removing duplicate records

I have huge txt file having millions of trade data. For e.g Trade.txt (first 8 lines in the file is header info) COB_DATE,TRADE_ID,SOURCE_SYSTEM_TRADE_ID,TRADE_GROUP_ID, TRADE_TYPE,DEALER_NAME,EXTERNAL_COUNTERPARTY_ID, EXTERNAL_COUNTERPARTY_NAME,DB_COUNTERPARTY_ID,... (6 Replies)
Discussion started by: nmumbarkar
6 Replies

10. Shell Programming and Scripting

Using a variable to select records with awk

As part of a bigger task, I had to read thru a file and separate records into various batches based on a field. Specifically, separate records based on the value in the batch field as defined below. The batch field left-justified numbers. The datafile is here > cat infile 12345 1 John Smith ... (5 Replies)
Discussion started by: joeyg
5 Replies
Login or Register to Ask a Question