Finding missing records and Dups


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding missing records and Dups
# 1  
Old 01-22-2014
Finding missing records and Dups

I have a fixed width file. The records looks something similar to below:

Type ID SSN NAME .....AND SOME MORE FIELDS
A1 1234 .....
A1 1234 .....
B1 1234 .....
M2 4567 .....
M2 4567 .....
N2 4567 .....
N2 4567 .....
A1 9999
N2 9999



Now if A1 is present then B1 has to be present. Also if M2 is present then N2 has to be present.
So A1 and B1 record goes together. And M2 and N2 records goes together.

I am looking for TWO things:

1) I am looking for records where A1 record has missing B1 record and vise versa AND I am looking for records
where M2 record has missing N2 record and vice versa.
In other words, if the A1 and B1 combination is missing then throw that record in a file. If M2 and N2 combination is missing throw
that in a file

e.g: So in the above example it will be the records below as both of them has missing B1 and M2 records.

A1 9999
N2 9999

2) I am looking to find duplicate records and put it in a separate file. For Duplicate we look for A1 and 1234 in combination(Type and Id field together).
e.g: So I am looking to put A1 duplicates in one file, B1 duplicates in another file,M2 duplicates in another file and N2 duplicates in another.
so from above example A1 1234 in one file as it is duplicate and M2 4567 will go in one file as it is duplicate and N2 4567 in another file.

Really appreciate your help.

Thanks
# 2  
Old 01-22-2014
Divide them into two files, A/M and (B/N only, where B and N are converted to A and M), sort each, comm -3, sed process them so any line beginning with a tab is a missing A/M else it indicates a missing B/N (convert on fly to B and N).
Code:
#!/bin/bash # or ksh where there is <().
 
comm -3 <(
  grep '^[AM]' infile |sort
 ) <(
  sed '
    s/^B/A/
    t
    s/^N/M/
    t
    d
   ' infile | sort 
 ) | sed '
    s/^\t//
    t
    s/^A/B/
    s/^M/N/
   '

Note: \t is typed in as a real tab.
# 3  
Old 01-22-2014
Try this awk solution:

Code:
awk '
BEGIN {
   for(i=split("A1 B1 M2 N2", s);i;i-=2) {
      SAME[s[i]]=s[i-1]
      SAME[s[i-1]]=s[i]
   }
}
$1 SUBSEP $2 in K {
   print > "dup_"$1
   next
}
{
    H[NR]=$0
    K[$1,$2]=NR
}
END {
   for(i=1;i<=NR;i++) {
     if(i in H) {
        split(H[i], V)
        if (!((SAME[V[1]] SUBSEP V[2]) in K))
            print H[i] > "missing_"V[1]
        else print H[i]
     }
    }
}' infile

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Finding records NOT on another file

I have three files named ALL, MATCH, and DIFF. Match and diff have completely different records included in the "all" file, but the "all" file also has records not in either the Match or Diff files. I know I can sort all three files together, one unique and one without that option to show which... (5 Replies)
Discussion started by: wbport
5 Replies

2. AIX

Need help finding missing drivers

I'm in the process of migrating a system to some newer hardware (Power 5 to Power 7). I've done these migrations in the past, and have not had any problems. But this system does not see the new network controllers on the Power 7 system. The system was running AIX 5.3 before, I've upgraded it to... (5 Replies)
Discussion started by: acascianelli
5 Replies

3. Shell Programming and Scripting

Finding the records with a specified length

I have a sample txt file which has different variable lengths of 2,10,3,15. What is the command that I need use in order to get the record count that has length '3' Thanks (3 Replies)
Discussion started by: bobby1015
3 Replies

4. Shell Programming and Scripting

Finding missing tags

I have a list containing strings. All strings should have either "smp" or "drw" else it is considered an error. I have written this code below. Any better ideas to tackle this? set fdrw = 0 set fsmp = 0 foreach f ($Lst) set fdrwtag = `echo $f | awk '/drw/'` set fsmptag = `echo $f | awk... (1 Reply)
Discussion started by: kristinu
1 Replies

5. Shell Programming and Scripting

finding missing items in file

I have a need to compare 2 files, then print results to file. Need to find items from file2 that are not found in file 1. thanks in advance! example: file 1: abcde=12 fffff=6 bbbb=35 file2: abcde=12 fffff=6 bbbb=35 ccccc=10 kkkkk=45 (8 Replies)
Discussion started by: discostu
8 Replies

6. Shell Programming and Scripting

Finding missing sequential file names

So, I've got a ton of files that I want to go through (ie something like 300,000), and they're all labeled sequentially. However I'm not 100% positive that they are all there. Is there any way of running through a sequence of numbers, checking if the file is in the folder, if not appending it... (2 Replies)
Discussion started by: Julolidine
2 Replies

7. UNIX for Advanced & Expert Users

Finding which file is missing

I was hoping someone ould help me with the following. I have 2 files in a directory FILEA and FILEB. i am running a process on these 2 files but before the process can run both FILEA and FILEB need to be present. If one or both the files are missing i need to know what file(s) is(are)... (10 Replies)
Discussion started by: SAMZ
10 Replies

8. UNIX for Advanced & Expert Users

Urgent: How can i get the missing records from one file out of two

Hi, I have two files say A and B, Both files have some common records few records which are unique to file A and unique to file B. Can anyone please help me out to find the records which are present in only B Please consider the files are of too large size. Thanks:confused: (1 Reply)
Discussion started by: Shiv@jad
1 Replies

9. UNIX for Dummies Questions & Answers

using cat and grep to display missing records

Gentle Unix users, Can someone tell me how I can use a combination of the cat and grep command to display records that are in FileA but missing in FileB. cat FileA one line at a time and grep to see if it is in fileB. If it is ignore. If line is not in fileB display the line. Thanks in... (4 Replies)
Discussion started by: jxh461
4 Replies

10. UNIX for Dummies Questions & Answers

deleting records with a missing field

I had to delete rows when a record was missing a field. My solution was cut -c 202-402 ${dataFile} | awk '{if (substr($0,103,30) !~ /^ *$/) print $0} >> ${workFile} The cut is because they came two records to a row. Anyone want to offer a more elegant solution? (2 Replies)
Discussion started by: gillbates
2 Replies
Login or Register to Ask a Question