Finding records NOT on another file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Finding records NOT on another file
# 1  
Old 11-02-2018
Finding records NOT on another file

I have three files named ALL, MATCH, and DIFF. Match and diff have completely different records included in the "all" file, but the "all" file also has records not in either the Match or Diff files.

I know I can sort all three files together, one unique and one without that option to show which ones appear in two files by running diff, but how can I find the records that are only in the "all" file?

TIA
# 2  
Old 11-02-2018
If ALL is small enough to fit in memory:
Code:
awk 'NR==FNR { A[$0] ; next } ; $0 in A { delete A[$0] } END { for(X in A) { print X }' ALL MATCH DIFF

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 11-03-2018
Try also

Code:
sort ALL MATCH DIFF | uniq -c | grep "^ *1"

# 4  
Old 11-06-2018
Sorted (untested):
Code:
comm -23 <(sort ALL) <(sort MATCH DIFF)

Unsorted (untested):
Code:
fgrep -f <(comm -23 <(sort ALL) <(sort MATCH DIFF) ALL)

You may wish to use the -u switch to sort to remove duplicate lines.

Andrew
# 5  
Old 11-06-2018
One could also try:
Code:
awk 'FNR == 1 { fc++ } fc < 3 {d[$0]; next } !($0 in d)' DIFF MATCH ALL

which has been tested.

This requires enough space for the unique records in DIFF and MATCH to be held in memory, but doesn't require space in memory for the unique records in ALL.
# 6  
Old 11-06-2018
The following variant works with any number of "exclude"-files
Code:
awk 'BEGIN {nfiles=ARGC-1} FNR == 1 { fc++ } fc < nfiles {d[$0]; next } !($0 in d)' DIFF MATCH ALL

Another idea: make the last filename special
Code:
awk 'FILENAME!="-" { d[$0]; next } !($0 in d)' MATCH DIFF - < ALL

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

UNIX scripting for finding duplicates and null records in pk columns

Hi, I have a requirement.for eg: i have a text file with pipe symbol as delimiter(|) with 4 columns a,b,c,d. Here a and b are primary key columns.. i want to process that file to find the duplicates and null values are in primary key columns(a,b) . I want to write the unique records in which... (5 Replies)
Discussion started by: praveenraj.1991
5 Replies

2. Shell Programming and Scripting

Finding missing records and Dups

I have a fixed width file. The records looks something similar to below: Type ID SSN NAME .....AND SOME MORE FIELDS A1 1234 ..... A1 1234 ..... B1 1234 ..... M2 4567 ..... M2 4567 ..... N2 4567 ..... N2 4567 ..... A1 9999 N2 9999 Now if A1 is present then B1 has to be present.... (2 Replies)
Discussion started by: Saanvi1
2 Replies

3. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

4. Shell Programming and Scripting

Finding the records with a specified length

I have a sample txt file which has different variable lengths of 2,10,3,15. What is the command that I need use in order to get the record count that has length '3' Thanks (3 Replies)
Discussion started by: bobby1015
3 Replies

5. Shell Programming and Scripting

Finding some records with sed command

Hi for all! sorry guys for my dumb question, but I'm really need help so, we have file with many many fields, like this one: 201001002359 blablabla 87654321 201001002359 123,56 77272588300 blablabla/123 91823778544and I wrote awk command awk '{if($6~/(2588300|2580000|2587021)$/)print}'so,... (8 Replies)
Discussion started by: shizik
8 Replies

6. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

7. Shell Programming and Scripting

awk script required for finding records in 1 file with corresponding another file.

Hi, I have a .txt file (uniqfields.txt) with 3 fields separated by " | " (pipe symbol). This file contains unique values with respect to all these 3 fields taken together. There are about 40,000 SORTED records (rows) in this file. Sample records are given below. 1TVAO|OVEPT|VO... (2 Replies)
Discussion started by: RRVARMA
2 Replies

8. Programming

Finding number of records in SAS dataset

I am running the following Korn shell script: #!/usr/bin/ksh num_records=`sas "select count(*) from /users/abc/123/sasdata.sas7bdat"` echo "$num_records" The script keeps returning an invalid file error even though I am certain that the file really exists. Does anyone see anything wrong... (1 Reply)
Discussion started by: sasaliasim
1 Replies

9. Shell Programming and Scripting

Count No of Records in File without counting Header and Trailer Records

I have a flat file and need to count no of records in the file less the header and the trailer record. I would appreciate any and all asistance Thanks Hadi Lalani (2 Replies)
Discussion started by: guiguy
2 Replies

10. Shell Programming and Scripting

finding null records in data file

I am having a "|" delimited flat file and I have to pick up all the records with the 2nd field having null value. Please suggest. (3 Replies)
Discussion started by: dsravan
3 Replies
Login or Register to Ask a Question