Compare columns of 2 files based on condition defined in a different file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare columns of 2 files based on condition defined in a different file
# 1  
Old 11-07-2010
Compare columns of 2 files based on condition defined in a different file

I have a control file which tells me which are the fields in the files I need to compare and based on the values I need to print the exact value if key =Y and output is Y , or if output is Y/N then I need to print only Y if it matches or N if it does not match and if output =N , then skip the feild to compare and write it to a output file
For ex:
my control file
Code:
key|compare_field|output
Y|Field_1|Y
N|Filed_2|Y/N
Y|Field_3|Y
N|Field_4|Y/N
N|Field_5|N
N|Field_6|Y/N
file1
field_1|feld_2|field_3|field_4|field_5|field_6
000|adbc|edfr|hjkl|890|jlk|ioy
678|jfjd|djla|uopp|678|jyh|jkl
file2
field_1|feld_2|field_3|field_4|field_5|field_6
000|adbc|edfr|hjkl|890|jlk|ioy
678|juio|djla|uopu|678|jyh|jkl
my output should be
field_1|feld_2|field_3|field_4|field_6
000|Y|edfr|Y|Y
678|N|djla|N|Y

I was trying to do it in 2 parts and then combine, but I am lost, need your help to combine this logic.
Code:
# to copy the field names as the header in the report file.
nawk -F\| 'END {print x } $NF =="Y" || $NF == "Y\/N" { printf "%s",$2 FS >> "report_file" }' control_file

To compare the 2 files and print the output as Y or N
Code:
nawk -F'|' '{ getline x <f; split(x,F,"|")}
NR >1 {for(i=2;i<=NF;i++) $i=(F[i]==$i)?"Y";"N"}1' OFS="|" f=file2 file1

I can do then seperately, but I am not able to read the control file and compare the files based on the control file.

Please help me.
Thanks in Advance
newtoawk

Last edited by Scott; 11-07-2010 at 05:57 AM.. Reason: Please use code tags
# 2  
Old 11-07-2010
Something like this,

Code:
awk -F'|' 'NR==FNR && NR>1 {a[++i]=$1$3;next} FNR>1 {if (b[FNR]) { c[FNR]=$0} else {b[FNR]=$0}} END {for(k in c) {split(c[k],d,"|");split(b[k],e,"|") ;for (j=1;j<=6;j++) {if (a[j]=="YY") {printf "%s|", d[j]} else if(a[j] != "NN") {printf "%s|" ,(d[j]==e[j])?"Y":"N"}}printf "\n"}}' control_file file1 file2

# 3  
Old 11-07-2010
Thanks a lot Pravin, wish you happy deepavali to you. I ran the script and this is the output I got.
Code:
Y|adbc|Y|Y|Y|
Y|juio|Y|Y|Y|

Can you please explain me the code, so that I can make changes accordingly.
Thanks once again.

Last edited by Scott; 11-07-2010 at 05:57 AM..
# 4  
Old 11-07-2010
I hope this will help you.
Code:
awk -F'|' 'NR==FNR && NR>1 {a[++i]=$1$3;next}   #Read first file i.e. control_file starting from line 2 and fill the  array 'a' with value $1$3 i.e. Key and output field
            FNR>1 { if (b[FNR]) { c[FNR]=$0} else { b[FNR]=$0} } #Read file1 and file2 and fill the array 'b' for file1 and 'c' for file2
            END {
                for(k in c) {
                             split(c[k],d,"|");split(b[k],e,"|") ; # fill the array 'd' and 'e' by spilting record into fields of file1 and file2
                             for (j=1;j<=6;j++) {
                                                 if (a[j]=="YY") { #if key and output both are 'Y' then print the field as it is
                                                                  printf "%s|", d[j]
                                                                  } 
                                                 else if(a[j] != "NN") {  #if key and output both are not 'N'
                                                                        printf "%s|" ,(d[j]==e[j])?"Y":"N" #if field from file1 and file2 same then print 'Y' else 'N'
                                                                        }
                                                  }printf "\n"
                             }
                 }' control_file file1 file2

# 5  
Old 11-08-2010
thanks a lot Pravin ..it works fine ...my compare fields would change ..so I can not hardcode the vaule in for (j=1;j<=6;j++).

I tried couple of things like for (j=1;j<=NF;j++) ..the result set had more fields.
I even tried this

Code:
nawk -F'|' 'num==NF;NR==FNR && NR>1 {a[++i]=$1$3;next}   
             FNR>1 { if (b[FNR]) { c[FNR]=$0} else { b[FNR]=$0} } 
             END {printf "\n"
                 for(k in c) {
                              split(c[k],d,"|");split(b[k],e,"|") ; 
                              for (j=1;j<=$num;j++) {
                                                  if (a[j]=="YY") { 
                                             
                                             printf "%s|", d[j]
                                                                   } 
                                                  else if(a[j] != "NN") {  
                                                                         printf "%s|" ,(d[j]==e[j])?"Y":"N" 
                                                                         }
                                                   }printf "\n" 
                              }
                  }' ctl_file file_1 file_2

can you please help me.

Thanks in Advacne
NewtoAwk

Moderator's Comments:
Mod Comment Please use [CODE][/CODE] tags

Last edited by pludi; 11-08-2010 at 08:57 AM..
# 6  
Old 11-08-2010
Hi,

You can use the below for loop, bcoz we are taking records in array 'a' with index 'i'
Code:
(j=1;j<=i;j++)

# 7  
Old 11-13-2010
thanks Pravin, it worked. Can I pass the field delimiter as a variable.B'cas I need to read the output format from a file.
for ex: instead of nawk -F'|' -- can I do it something like this
output_format=| or output_format=\t
nawk -F'$output_format' ... does this work, or is there anyother way to do it.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

I have this list of files . Now I will have to pick the latest file based on some condition

3679 Jul 21 23:59 belk_rpo_error_**po9324892**_07212014.log 0 Jul 22 23:59 belk_rpo_error_**po9324892**_07222014.log 3679 Jul 23 23:59 belk_rpo_error_**po9324892**_07232014.log 22 Jul 22 06:30 belk_rpo_error_**po9324267**_07012014.log 0 Jul 20 05:50... (5 Replies)
Discussion started by: LoneRanger
5 Replies

2. Shell Programming and Scripting

[Solved] awk compare two different columns of two files and print all from both file

Hi, I want to compare two columns from file1 with another two column of file2 and print matched and unmatched column like this File1 1 rs1 abc 3 rs4 xyz 1 rs3 stu File2 1 kkk rs1 AA 10 1 aaa rs2 DD 20 1 ccc ... (2 Replies)
Discussion started by: justinjj
2 Replies

3. Shell Programming and Scripting

Convert rows to columns based on condition

I have a file some thing like this: GN Name=YWHAB; RC TISSUE=Keratinocyte; RC TISSUE=Thymus; CC -!- FUNCTION: Adapter protein implicated in the regulation of a large CC spectrum of both general and specialized signaling pathways GN Name=YWHAE; RC TISSUE=Liver; RC ... (13 Replies)
Discussion started by: raj_k
13 Replies

4. Shell Programming and Scripting

Extracting rows and columns in a matrix based on condition

Hi I have a matrix with n rows and m columns like below example. i want to extract all the pairs with values <200. Input A B C D A 100 206 51 300 B 206 100 72 48 C 351 22 100 198 D 13 989 150 100 Output format A,A:200 A,C:51 B,B:100... (2 Replies)
Discussion started by: anurupa777
2 Replies

5. Shell Programming and Scripting

Compare files in a folder based on another file

I have a file named file.txt that looks as follows //class1.txt 45 234 67 89 90 //class2.txt 456 34 78 89 120 class1 and class2.txt are the names of files in a folder named folder1. The content of class1.txt file in folder1 67 9 89 5 234 9The content of class2.txt file in... (1 Reply)
Discussion started by: jaff rufus
1 Replies

6. Shell Programming and Scripting

compare 2 files and return unique lines in each file (based on condition)

hi my problem is little complicated one. i have 2 files which appear like this file 1 abbsss:aa:22:34:as akl abc 1234 mkilll:as:ss:23:qs asc abc 0987 mlopii:cd:wq:24:as asd abc 7866 file2 lkoaa:as:24:32:sa alk abc 3245 lkmo:as:34:43:qs qsa abc 0987 kloia:ds:45:56:sa acq abc 7805 i... (5 Replies)
Discussion started by: anurupa777
5 Replies

7. Shell Programming and Scripting

compare 2 files and extract the data which is not present in other file with condition

I have 2 files whose data's are as follows : fileA 00 lieferungen 00 attractiop 01 done 02 forness 03 rasp 04 alwaysisng 04 funny 05 done1 fileB alwayssng dkhf fdgdfg dfgdg sdjkgkdfjg funny rasp (7 Replies)
Discussion started by: rajniman
7 Replies

8. Shell Programming and Scripting

compare 2 files based on columns

Hi Experts, Is there a way to compare 2 files by columns and print matching cases. I have 2 files as below, I want cases where col1 and col2 in f1 matches col1 and col2 in f2 to be printed as output. The separator is space. I want the output to have col1 col2 col 3 from both files printed... (7 Replies)
Discussion started by: novice_man
7 Replies

9. Shell Programming and Scripting

How to compare 2 files & get only few columns based on a condition related to both files?

Hiiiii friends I have 2 files which contains huge data & few lines of it are as shown below File1: b.dat(which has 21 columns) SSR 1976 8 12 13 10 44.00 39.0700 70.7800 7.0 0 0.00 0 2.78 0.00 0.00 0 0.00 2.78 0 NULL ISC 1976 8 12 22 32 37.39 36.2942 70.7338... (6 Replies)
Discussion started by: reva
6 Replies

10. Shell Programming and Scripting

compare two columns of different files and print the matching second file..

Hi, I have two tab separated files; file1: S.No ddi fi cu o/l t+ t- 1 0.5 0.6 o 0.1 0.2 2 0.2 0.3 l 0.3 0.4 3 0.5 0.8 l 0.1 0.6 ... (5 Replies)
Discussion started by: vasanth.vadalur
5 Replies
Login or Register to Ask a Question