awk - compare 1st 15 fields of record with 20 fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk - compare 1st 15 fields of record with 20 fields
# 1  
Old 10-19-2013
awk - compare 1st 15 fields of record with 20 fields

I'm trying to compare 2 files for differences in a selct number of fields. When differnces are found it will write the whole record of the second file including appending '|C' out to a delta file. Each record will have 20 fields, but only want to do comparison of 1st 15 fields. The 1st field of each record is the primary key. When comparing the fields differences should only be captured if there is a change in fields 2-15. If there is a difference in fields 16-20 they should be ignored and therefore a record will not be written to the delta file.

Code:
awk -F\| 'NR==FNR{for(i=1; i<=15;i++) a[$i]++;next} ($1 in a) ' FileA.txt FileB.txt | sed 's/^[ \t]*//;s/[ \t]*$/|C/' > delta.txt
 
File1 (differences in red):
000001323567|BELLTOWER|BOBBY|BAT|BBBAT@hello.com|JOB2|CHARLES|CH@goodbye.com|A||COMPANY1|00283|123 RED WAY|Y|N||MINNY|MN|34217|1
000001678932|STRAIGHT|HENRY|CAT|SHCAT@hello.com|PARTY2|SUZY|SU@goodbye.com|R||COMPANY7|00993|456 GREEN WAY|N|N||SUNNY|FL|45691|9
 
File2 (differences in red)
000001323567|BELLTOWER|ROBERT|BAT|BBBAT@hello.com|JOB2|CHARLES|CH@goodbye.com|A||COMPANY1|00283|123 RED WAY|Y|N||MINNY|MN|34217|1
000001678932|STRAIGHT|HENRY|CAT|SHCAT@hello.com|PARTY2|SUZY|SU@goodbye.com|R||COMPANY7|00993|456 GREEN WAY|N|N||SUNNYSIDE|FL|45691|9
 
Output (only record 1 since the difference in the second record occurred in the 17th field):
000001323567|BELLTOWER|ROBERT|BAT|BBBAT@hello.com|JOB2|CHARLES|CH@goodbye.com|A||COMPANY1|00283|123 RED WAY|Y|N||MINNY|MN|34217|1|C

# 2  
Old 10-19-2013
Try this:

Code:
$ awk -F"|" 'NR==FNR{a[$1]=$2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15;next}a[$1]==$2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12 $13 $14 $15{;next}a[$1]{print $0 "|C"}' file1.txt file2.txt

This User Gave Thanks to mjf For This Post:
# 3  
Old 10-20-2013
mjf, thank you for your help. It is working great for all my test files/scenarios
# 4  
Old 10-20-2013
Or have a long artificial key
Code:
awk -F\| 'NR==FNR{a[$1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7 FS $8 FS $9 FS $10 FS $11 FS $12 FS $13 FS $14 FS $15];next} (($1 FS $2 FS $3 FS $4 FS $5 FS $6 FS $7 FS $8 FS $9 FS $10 FS $11 FS $12 FS $13 FS $14 FS $15) in a) '

Anyway there should be delimiters between the fields!
# 5  
Old 10-20-2013
Nor sure if this is more elegant / faster / more efficient:
Code:
awk     '               {X=$0; NF=15; $1=""}
         NR==FNR        {Arr[$0];next}
         !($0 in Arr)   {print X, "C"}
        ' FS=\| OFS=\| file1 file2
000001323567|BELLTOWER|ROBERT|BAT|BBBAT@hello.com|JOB2|CHARLES|CH@goodbye.com|A||COMPANY1|00283|123 RED WAY|Y|N||MINNY|MN|34217|1|C

MadeInGermany's proposal is lacking an ! negation in front of the ((... in a)) .
# 6  
Old 10-20-2013
Quote:
Anyway there should be delimiters between the fields!
MadeInGermany,
For my education, can you please explain why there should be delimiters in my solution (Don Cragun also pointed this out in another of my awk posts)? Is there a scenario that would not work without delimeters but works with delimiters?
# 7  
Old 10-20-2013
Quote:
Originally Posted by mjf
MadeInGermany,
For my education, can you please explain why there should be delimiters in my solution (Don Cragun also pointed this out in another of my awk posts)? Is there a scenario that would not work without delimeters but works with delimiters?
Suppose that you have two records in a file with '|' as the field separator:
Code:
ab|c
a|bc

and you want to count the number of occurrences of different settings in the 1st two fields.
Code:
awk -F '|' '
{ cnt[$1 $2]++ }
END { for(i in cnt) print i, cut[i] }
' file

would produce:
Code:
abc 2

even though field 1 is different in both lines and field 2 is different in both lines. Using cnt[$1 FS $2]++ or cnt[$1,$2]++ instead of cnt[$1 $2]++ avoids the problem.

Last edited by Don Cragun; 10-20-2013 at 10:23 AM.. Reason: Fixed example.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Is there a UNIX command that can compare fields of files with differing number of fields?

Hi, Below are the sample files. x.txt is from an Excel file that is a list of users from Windows and y.txt is a list of database account. $ head -500 x.txt y.txt ==> x.txt <== TEST01 APP_USER_PROFILE USER03 APP_USER_PROFILE TEST02 APP_USER_EXP_PROFILE TEST04 APP_USER_PROFILE USER01 ... (3 Replies)
Discussion started by: newbie_01
3 Replies

2. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat... (9 Replies)
Discussion started by: newstart
9 Replies

3. Shell Programming and Scripting

How to print 1st field and last 2 fields together and the rest of the fields after it using awk?

Hi experts, I need to print the first field first then last two fields should come next and then i need to print rest of the fields. Input : a1,abc,jsd,fhf,fkk,b1,b2 a2,acb,dfg,ghj,b3,c4 a3,djf,wdjg,fkg,dff,ggk,d4,d5 Expected output: a1,b1,b2,abc,jsd,fhf,fkk... (6 Replies)
Discussion started by: 100bees
6 Replies

4. Shell Programming and Scripting

Compare fields and keep record with bigger ID?

How do you write a shell script to compare records with the same fields then keep the biggeer id number fields (field separate by a pipe) 1150| San Jose|8|15|7|2013-02-19 00:00:00.000|2013-02-20 00:00:00.000 1263|San Jose|8|15|7|2013-02-19 00:00:00.000|2013-02-20 00:00:00.000... (4 Replies)
Discussion started by: sabercats
4 Replies

5. UNIX for Dummies Questions & Answers

keeping last record among group of records with common fields (awk)

input: ref.1;rack.1;1 #group1 ref.1;rack.1;2 #group1 ref.1;rack.2;1 #group2 ref.2;rack.3;1 #group3 ref.2;rack.3;2 #group3 ref.2;rack.3;3 #group3 Among records from same group (i.e. with same 1st and 2nd field - separated by ";"), I would need to keep the last record... (5 Replies)
Discussion started by: beca123456
5 Replies

6. Shell Programming and Scripting

Print all the fields of record using awk

Hi, i want to generate print statement using awk. i have 20+ and 30+ fields in each line Now its priting only first eight fields print statement as output not all. my record is as shown below filename ... (2 Replies)
Discussion started by: raghavendra.nsn
2 Replies

7. Shell Programming and Scripting

awk text record - prepend first field to all subsequent fields

Hello everyone, I've suddenly gotten very interested in sed and awk (and enjoying it quite a bit too) because of a large conversion project that we're working on. I'm currently stuck with a very inefficient process for processing text blocks. I'm sure someone here should be able to easily point out... (2 Replies)
Discussion started by: jameswatson3
2 Replies

8. Shell Programming and Scripting

Compare fields in 2 files using AWK

Hi unix gurus, I have a urgent requirement, I need to write a AWK script to compare each fields in 2 files using AWK. Basically my output should be like this. file1 row|num1|num2|num3 1|one|two|three 2|one|two|three file2 row|num1|num2|num3 1|one|two|three 2|one|two|four ... (5 Replies)
Discussion started by: rashmisb
5 Replies

9. Shell Programming and Scripting

awk sed cut? to rearrange random number of fields into 3 fields

I'm working on formatting some attendance data to meet a vendors requirements to upload to their system. With some help on the forums here, I have the data close. But they've since changed what they want. The vendor wants me to submit three fields to them. Field 1 is the studentid field,... (4 Replies)
Discussion started by: axo959
4 Replies

10. Shell Programming and Scripting

awk: record has too many fields

Hi, I'm trying this command - but get this error. Do you guys have any workaround for this? cat tf|sed 's/{//g'|sed 's/,//g'|awk '{for (i=1;i<=NF;i++) {if ($i == "OPTIME") {k = i + 2; print $i,$k}}}' awk: record `2005 Jul 28 17:35:29...' has too many fields record number 15 This is how... (3 Replies)
Discussion started by: chaandana
3 Replies
Login or Register to Ask a Question