Remove duplicates

12-03-2008

Registered User

20, 0

Join Date: Jun 2007

Last Activity: 1 December 2009, 6:43 AM EST

Posts: 20

Thanks Given: 0

Thanked 0 Times in 0 Posts

Remove duplicates

Hello Experts,

I have two files named old and new. Below are my example files. I need to compare and print the records that only exist in my new file. I tried the below awk script, this script works perfectly well if the records have exact match, the issue I have is my old file has got extra spaces/tabs at the end of the line and few blank lines at the beginning because of which the awk script produces the records that are in my old file, which I am not interested in.

I am looking for a solution that can strip the extra spaces/tabs at the end of the line
(or)
Instead of comparing the entire record I could compare just the 1st and 2nd field and if it matches with the 1st and 2nd field of my old file and produce the same result as shown. This might require a tweak in the awk script

nawk 'NR==FNR{a[$0];next}!($0 in a)' old new > sdiff

Old:
ACB_XY_01 1 hello
ACB_XY_03 1 hai
ACB_XY_04 1 good
ACB_XY_04 2 luck

New:
ACB_XY_01 1 hello
ACB_XY_01 2 hai
ACB_XY_03 1 hai
ACB_XY_04 1 good
ACB_XY_04 2 luck

Output:
ACB_XY_01 2 hai

Many thanks in advance.

forumthreads

View Public Profile for forumthreads

Find all posts by forumthreads

12-03-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Code:

awk 'NR == FNR {
  _[$1,$2]
  next
  }
!(($1,$2) in _)
' Old New

radoulov

View Public Profile for radoulov

Find all posts by radoulov

12-03-2008

Registered User

20, 0

Join Date: Jun 2007

Last Activity: 1 December 2009, 6:43 AM EST

Posts: 20

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hello again,

Many thanks for your response. I tried the command with awk and nawk as below

awk 'NR == FNR {_[$1,$2] next }!(($1,$2) in _)' old new

but get an error message as

awk: syntax error near line1
awk: illegal statement near line1
nawk: syntax error at source line 1
context is
NR==FNR { _[$1,$2] >>> next <<< } ! (($1, $2) in _)
nawk: illegal statement at source line 1

Please can you tell me where i am going wrong

forumthreads

View Public Profile for forumthreads

Find all posts by forumthreads

12-03-2008

Registered User

5,690, 630

Join Date: Jan 2007

Last Activity: 9 January 2017, 4:40 AM EST

Location: Варна, България / Milano, Italia

Posts: 5,690

Thanks Given: 184

Thanked 630 Times in 587 Posts

Quote:

Originally Posted by forumthreads

Yes,
in my post there is a newline before the next statement.
If you want to run it on one line, you should change the command:

Code:

awk 'NR == FNR { _[$1,$2]; next } !(($1,$2) in _)' old new

radoulov

View Public Profile for radoulov

Find all posts by radoulov

12-03-2008

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

To use the command as a oneliner you have to place a command delimiter before the "next" statement

Code:

awk 'NR == FNR {_[$1,$2]; next }!(($1,$2) in _)' old new

Regards

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

Shell Programming and Scripting