Finding records NOT on another file

11-02-2018

Registered User

175, 22

Join Date: Sep 2013

Last Activity: 31 December 2018, 3:07 PM EST

Location: Mississippi

Posts: 175

Thanks Given: 40

Thanked 22 Times in 22 Posts

Finding records NOT on another file

I have three files named ALL, MATCH, and DIFF. Match and diff have completely different records included in the "all" file, but the "all" file also has records not in either the Match or Diff files.

I know I can sort all three files together, one unique and one without that option to show which ones appear in two files by running diff, but how can I find the records that are only in the "all" file?

TIA

wbport

View Public Profile for wbport

Visit wbport's homepage!

Find all posts by wbport

11-02-2018

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

If ALL is small enough to fit in memory:

Code:

awk 'NR==FNR { A[$0] ; next } ; $0 in A { delete A[$0] } END { for(X in A) { print X }' ALL MATCH DIFF

This User Gave Thanks to Corona688 For This Post:

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

11-03-2018

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try also

Code:

sort ALL MATCH DIFF | uniq -c | grep "^ *1"

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-06-2018

Registered User

474, 160

Join Date: Feb 2011

Last Activity: 22 May 2020, 9:47 AM EDT

Posts: 474

Thanks Given: 51

Thanked 160 Times in 135 Posts

Sorted (untested):

Code:

comm -23 <(sort ALL) <(sort MATCH DIFF)

Unsorted (untested):

Code:

fgrep -f <(comm -23 <(sort ALL) <(sort MATCH DIFF) ALL)

You may wish to use the -u switch to sort to remove duplicate lines.

Andrew

apmcd47

View Public Profile for apmcd47

Find all posts by apmcd47

11-06-2018

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

One could also try:

Code:

awk 'FNR == 1 { fc++ } fc < 3 {d[$0]; next } !($0 in d)' DIFF MATCH ALL

which has been tested.

This requires enough space for the unique records in DIFF and MATCH to be held in memory, but doesn't require space in memory for the unique records in ALL.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-06-2018

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

The following variant works with any number of "exclude"-files

Code:

awk 'BEGIN {nfiles=ARGC-1} FNR == 1 { fc++ } fc < nfiles {d[$0]; next } !($0 in d)' DIFF MATCH ALL

Another idea: make the last filename special

Code:

awk 'FILENAME!="-" { d[$0]; next } !($0 in d)' MATCH DIFF - < ALL

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

UNIX for Beginners Questions & Answers

Finding records NOT on another file

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

UNIX scripting for finding duplicates and null records in pk columns

Discussion started by: praveenraj.1991

2. Shell Programming and Scripting

Finding missing records and Dups

Discussion started by: Saanvi1

3. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

Discussion started by: vestport

4. Shell Programming and Scripting

Finding the records with a specified length

Discussion started by: bobby1015

5. Shell Programming and Scripting

Finding some records with sed command

Discussion started by: shizik

6. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Discussion started by: Atrisa

7. Shell Programming and Scripting

awk script required for finding records in 1 file with corresponding another file.

Discussion started by: RRVARMA

8. Programming

Finding number of records in SAS dataset

Discussion started by: sasaliasim

9. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

Discussion started by: guiguy

10. Shell Programming and Scripting

finding null records in data file

Discussion started by: dsravan