Remove duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicates
# 1  
Old 12-03-2008
Remove duplicates

Hello Experts,

I have two files named old and new. Below are my example files. I need to compare and print the records that only exist in my new file. I tried the below awk script, this script works perfectly well if the records have exact match, the issue I have is my old file has got extra spaces/tabs at the end of the line and few blank lines at the beginning because of which the awk script produces the records that are in my old file, which I am not interested in.

I am looking for a solution that can strip the extra spaces/tabs at the end of the line
(or)
Instead of comparing the entire record I could compare just the 1st and 2nd field and if it matches with the 1st and 2nd field of my old file and produce the same result as shown. This might require a tweak in the awk script

nawk 'NR==FNR{a[$0];next}!($0 in a)' old new > sdiff

Old:
ACB_XY_01 1 hello
ACB_XY_03 1 hai
ACB_XY_04 1 good
ACB_XY_04 2 luck

New:
ACB_XY_01 1 hello
ACB_XY_01 2 hai
ACB_XY_03 1 hai
ACB_XY_04 1 good
ACB_XY_04 2 luck

Output:
ACB_XY_01 2 hai

Many thanks in advance.
# 2  
Old 12-03-2008
Code:
awk 'NR == FNR {
  _[$1,$2]
  next
  }
!(($1,$2) in _)
' Old New

# 3  
Old 12-03-2008
Hello again,

Many thanks for your response. I tried the command with awk and nawk as below

awk 'NR == FNR {_[$1,$2] next }!(($1,$2) in _)' old new

but get an error message as

awk: syntax error near line1
awk: illegal statement near line1
nawk: syntax error at source line 1
context is
NR==FNR { _[$1,$2] >>> next <<< } ! (($1, $2) in _)
nawk: illegal statement at source line 1

Please can you tell me where i am going wrong
# 4  
Old 12-03-2008
Quote:
Originally Posted by forumthreads
Hello again,

Many thanks for your response. I tried the command with awk and nawk as below

awk 'NR == FNR {_[$1,$2] next }!(($1,$2) in _)' old new

but get an error message as

awk: syntax error near line1
awk: illegal statement near line1
nawk: syntax error at source line 1
context is
NR==FNR { _[$1,$2] >>> next <<< } ! (($1, $2) in _)
nawk: illegal statement at source line 1

Please can you tell me where i am going wrong
Yes,
in my post there is a newline before the next statement.
If you want to run it on one line, you should change the command:

Code:
awk 'NR == FNR { _[$1,$2]; next } !(($1,$2) in _)' old new

# 5  
Old 12-03-2008
To use the command as a oneliner you have to place a command delimiter before the "next" statement

Code:
awk 'NR == FNR {_[$1,$2]; next }!(($1,$2) in _)' old new

Regards
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove duplicates using for loop?

values=(1 2 3 5 4 2 3 1 6 8 3 5 ) #i need the output like this by removing the duplicates 1 2 3 5 4 6 8 #i dont need sorting in my program #plz explain me as simple using for loop #os-ubuntu ,shell=bash (5 Replies)
Discussion started by: Meeran Rizvi
5 Replies

2. Shell Programming and Scripting

Remove duplicates

Hi I have a below file structure. 200,1245,E1,1,E1,,7611068,KWH,30, ,,,,,,,, 200,1245,E1,1,E1,,7611070,KWH,30, ,,,,,,,, 300,20140223,0.001,0.001,0.001,0.001,0.001 300,20140224,0.001,0.001,0.001,0.001,0.001 300,20140225,0.001,0.001,0.001,0.001,0.001 300,20140226,0.001,0.001,0.001,0.001,0.001... (1 Reply)
Discussion started by: tejashavele
1 Replies

3. Shell Programming and Scripting

Sort and Remove duplicates

Here is my task : I need to sort two input files and remove duplicates in the output files : Sort by 13 characters from 97 Ascending Sort by 1 characters from 96 Ascending If duplicates are found retain the first value in the file the input files are variable length, convert... (4 Replies)
Discussion started by: ysvsr1
4 Replies

4. Shell Programming and Scripting

Remove duplicates

I have a file with the following format: fields seperated by "|" title1|something class|long...content1|keys title2|somhing class|log...content1|kes title1|sothing class|lon...content1|kes title3|shing cls|log...content1|ks I want to remove all duplicates with the same "title field"(the... (3 Replies)
Discussion started by: dtdt
3 Replies

5. Shell Programming and Scripting

Help with merge and remove duplicates

Hi all, I need some help to remove duplicates from a file before merging. I have got 2 files: file1 has data in format 4300 23456 4301 2357 the 4 byte values on the right hand side is uniq, and are not repeated anywhere in the file file 2 has data in same format but is not in... (10 Replies)
Discussion started by: roy121
10 Replies

6. Shell Programming and Scripting

awk remove first duplicates

Hi All, I have searched many threads for possible close solution. But I was unable to get simlar scenario. I would like to print all duplicate based on 3rd column except the first occurance. Also would like to print if it is single entry(non-duplicate). i/P file 12 NIL ABD LON 11 NIL ABC... (6 Replies)
Discussion started by: sybadm
6 Replies

7. Shell Programming and Scripting

Awk: Remove Duplicates

I have the following code for removing duplicate records based on fields in inputfile file & moves the duplicate records in duplicates file(1st Awk) & in 2nd awk i fetch the non duplicate entries in inputfile to tmp file and use move to update the original file. Requirement: Can both the awk... (4 Replies)
Discussion started by: siramitsharma
4 Replies

8. UNIX for Dummies Questions & Answers

Remove duplicates from a file

Can u tell me how to remove duplicate records from a file? (11 Replies)
Discussion started by: saga20
11 Replies

9. Shell Programming and Scripting

bash - remove duplicates

I need to use a bash script to remove duplicate files from a download list, but I cannot use uniq because the urls are different. I need to go from this: http://***/fae78fe/file1.wmv http://***/39du7si/file1.wmv http://***/d8el2hd/file2.wmv http://***/h893js3/file2.wmv to this: ... (2 Replies)
Discussion started by: locoroco
2 Replies

10. Shell Programming and Scripting

Script to remove duplicates

Hi I need a script that removes the duplicate records and write it to a new file for example I have a file named test.txt and it looks like abcd.23 abcd.24 abcd.25 qwer.25 qwer.26 qwer.98 I want to pick only $1 and compare with the next record and the output should be abcd.23... (6 Replies)
Discussion started by: antointoronto
6 Replies
Login or Register to Ask a Question