Reporting where values match and mismatch across three columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Reporting where values match and mismatch across three columns
# 1  
Old 03-26-2015
Reporting where values match and mismatch across three columns

Hello,
I have values in three columns. Some values occur in all three columns, other values are present in only one or two columns. I would like to be able to see where the matches and mismatches occur. Thanks in advance for any advice!

I have:

Code:
A     B     C
1     1     2
2     3     3
3     4     5
4            6
5

I would like:

Code:
A     B     C
1     1      X
2     X      2
3     3      3
4     4      X
5     X      5
X     X      6


Last edited by Scrutinizer; 03-26-2015 at 04:19 PM..
# 2  
Old 03-26-2015
Hello MDeBiasse,

If I caught your logic correctly then last line should be X 6 XFollowing may help you in same. Please let me know if this helps.
Code:
awk -vs1=`cat Input_file | wc -l` 'BEGIN{print "A" OFS "B" OFS "C"} FNR==NR && NR>1{A[$1];B[$2];C[$3];next} END{o=1;for(k=1;k<=s1;k++){S=(k in A)?o:"X";U=(k in B)?o:"X";V=(k in C)?o:"X";print S OFS U OFS V;S=U=V="";o++}}' Input_file Input_file

Where input_file is as follows:
Code:
A B C
1 1 2
2 3 3
3 4 5
4 6
5

Output will be as follows.
Code:
A B C
1 1 X
2 X 2
3 3 3
4 4 X
5 X 5
X 6 X

EDIT: Seems moderator has modified post now by adding code tags, thank you for same. Then Following is the input file per user.
Input_file:
Code:
cat testtest13
A     B     C
1     1     2
2     3     3
3     4     5
4           6
5

Then following code may help.
Code:
awk -vs1=`cat testtest13 | wc -l` -F"     " 'BEGIN{print "A" OFS "B" OFS "C"} FNR==NR && NR>1{sub(/[[:space:]]/,X,$1);sub(/[[:space:]]/,X,$2);sub(/[[:space:]]/,X,$3);A[$1];B[$2];C[$3];next} END{o=1;for(k=1;k<=s1;k++){S=(k in A)?o:"X";U=(k in B)?o:"X";V=(k in C)?o:"X";print S OFS U OFS V;S=U=V="";o++}}' testtest13 testtest13

Output will be as follows.
Code:
A B C
1 1 X
2 X 2
3 3 3
4 4 X
5 X 5
X X 6


EDIT: Adding a non one liner form for solution following.
Code:
awk -vs1=`cat testtest13 | wc -l` -F"     " 'BEGIN                      {print "A" OFS "B" OFS "C"}
                                             FNR==NR && NR>1            {sub(/[[:space:]]/,X,$1);
                                                                         sub(/[[:space:]]/,X,$2);
                                                                         sub(/[[:space:]]/,X,$3);
                                                                         A[$1];
                                                                         B[$2];
                                                                         C[$3];
                                                                         next
                                                                        }
                                             END                        {o=1;
                                                                                for(k=1;k<=s1;k++){
                                                                                                        S=(k in A)?o:"X";
                                                                                                        U=(k in B)?o:"X";
                                                                                                        V=(k in C)?o:"X";
                                                                                                        print S OFS U OFS V;
                                                                                                        S=U=V="";
                                                                                                        o++
                                                                                                  }
                                                                        }
                                             ' testtest13 testtest13


Thanks,
R. Singh

Last edited by RavinderSingh13; 03-26-2015 at 04:43 PM.. Reason: Added a new solution seems after adding code tags by moderator input became a little different so added solution accordingly
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 03-26-2015
Try (untested):
Code:
awk  '            {A[$1]=$1; B[$2]=$2; C[$3]=$3; for (i=1; i<=3; i++) D[$i]}
         END     {for (d in D) print A[d]+0, B[d]+0, C[d]+0}
        ' file

Replace 0 s with X s and sort, if need be.
This User Gave Thanks to RudiC For This Post:
# 4  
Old 03-26-2015
Hello MDeBiasse,

My solution in post#2 will not look for the highest number in all columns it will look for the number of lines in input file, now let's say our input file is as follows.
Input_file:
Code:
cat testtest13
A     B     C
1     1     12
2     3     3
3     14     5
4           6
5

Now following code may help.
Code:
awk -F"     " 'BEGIN{print "A" OFS "B" OFS "C"} FNR==NR && NR>1{k=0;for(i=1;i<=NF;i++){k=k<$i?$i:k};q=q<k?k:q;sub(/[[:space:]]/,X,$1);sub(/[[:space:]]/,X,$2);sub(/[[:space:]]/,X,$3);A[$1];B[$2];C[$3];next} END{o=1;for(k=1;k<=q;k++){S=(k in A)?o:"X";U=(k in B)?o:"X";V=(k in C)?o:"X";print S OFS U OFS V;S=U=V="";o++}}' testtest13 testtest13

Output will be as follows.
Code:
A B C
1 1 X
2 X X
3 3 3
4 X X
5 X 5
X X 6
X X X
X X X
X X X
X X X
X X X
X X 12
X X X
X 14 X

Hope this helps, will be glad if this works as per your expectations.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 03-26-2015
If the delimiter is a Tab character, then:

Code:
$
$ cat -t data1.txt
A^IB^IC
1^I1^I2
2^I3^I3
3^I4^I5
4^I^I6
5
$
$ perl -lne '@tokens = split/\t/;
             if ($.==1) { @cols = @tokens; next }
             foreach $i (0..$#tokens) {
                 next if $tokens[$i] eq "";
                 if (not defined $occurs{$tokens[$i]}) {
                     $occurs{$tokens[$i]} = [ "X", "X", "X" ]
                 }
                 $occurs{$tokens[$i]}->[$i] = $tokens[$i]
             }
             END {
                 print join " ", @cols;
                 foreach $k (sort keys %occurs) {
                     print join " ", @{$occurs{$k}}
                 }
             }
            ' data1.txt
A B C
1 1 X
2 X 2
3 3 3
4 4 X
5 X 5
X X 6
$
$

Otherwise, if the delimiter is one-or-more-blank-spaces and the file is fixed-format (i.e. the "6" in the penultimate line is in the same column as "C" in the first line), then:

Code:
$
$ cat -t data2.txt
A     B       C
1     1       2
2     3       3
3     4       5
4             6
5
$
$
$ perl -lne 'if ($.==1) {
                 @cols = /^(\S+)\s+(\S+)\s+(\S+)/;
                 $template = sprintf("A%d A%d A*", $-[2]-$-[1], $-[3]-$-[2]);
                 next;
             }
             @tokens = unpack($template, $_);
             foreach $i (0..$#tokens) {
                 next if $tokens[$i] eq "";
                 if (not defined $occurs{$tokens[$i]}) {
                     $occurs{$tokens[$i]} = [ "X", "X", "X" ]
                 }
                 $occurs{$tokens[$i]}->[$i] = $tokens[$i]
             }
             END {
                 print join " ", @cols;
                 foreach $k (sort keys %occurs) {
                     print join " ", @{$occurs{$k}}
                 }
             }
            ' data2.txt
A B C
1 1 X
2 X 2
3 3 3
4 4 X
5 X 5
X X 6
$
$

This User Gave Thanks to durden_tyler For This Post:
# 6  
Old 03-27-2015
Hello RudiC. I found your script to work well. However, when I change the 0's to X's, the output remains with 0's. Do you have any suggestions on how to fix this? Thank you!
# 7  
Old 03-27-2015
Try
Code:
awk     'NR==1          {print; next}
                        {A[$1]=$1; B[$2]=$2; C[$3]=$3; for (i=1; i<=3; i++) if ($i) D[$i]}
         END            {for (d in D)   {TMP=sprintf ("%s\t%s\t%s", A[d]+0, B[d]+0, C[d]+0)
                                         gsub (/0/, "X", TMP)
                                         print TMP
                                        }
                        }
        ' FS="\t" file
A    B    C
1    1    X
2    X    2
3    3    3
4    4    X
5    X    5
X    X    6

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

Match value in two files and replace values in selected columns

The purpose is to check if values for column 3 and 4 in file1 match with column 1 in file2. If any value match do: 1) Replace values in file2 for column 2 and 3 using the information of file1 columns 5 and 6 2) Replace string ($1,1,5) and string ($1,6,5) in file2 with values of columns 7... (8 Replies)
Discussion started by: jiam912
8 Replies

3. Shell Programming and Scripting

awk to output match and mismatch with count using specific fields

In the below awk I am trying output to one file those lines that match between $2,$3,$4 of file1 and file2 with the count in (). I am also trying to output those lines that are missing between $2,$3,$4 of file1 and file2 with the count of in () each. Both input files are tab-delimited, but the... (7 Replies)
Discussion started by: cmccabe
7 Replies

4. AIX

Compare two files and show the mismatch columns

I need to compare two files and find the mismatch columns in it for csv and fixed width file. Eg: file1 c1,c2,c3,c4<----columnname 1,a,4,d 2,b,5,e 3,c,6,f file2 c1,c2,c3,c4<----columnname 3,x,7,f 2,y,8,e 1,z,9,d output c2,c3<---- mismatch columname a,4 x,7 b,5 or y,8 Ok with... (3 Replies)
Discussion started by: sabzR
3 Replies

5. Shell Programming and Scripting

Request: How to Parse dynamic SQL query to pad extra columns to match the fixed number of columns

Hello All, I have a requirement in which i will be given a sql query as input in a file with dynamic number of columns. For example some times i will get 5 columns, some times 8 columns etc up to 20 columns. So my requirement is to generate a output query which will have 20 columns all the... (7 Replies)
Discussion started by: vikas_trl
7 Replies

6. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns match on two rows

Hi all, I know this sounds suspiciously like a homework course; but, it is not. My goal is to take a file, and match my "ID" column to the "Date" column, if those conditions are true, add the total number of minutes worked and place it in this file, while not printing the original rows that I... (6 Replies)
Discussion started by: mtucker6784
6 Replies

7. Shell Programming and Scripting

Adding columns with values dependent on existing columns

Hello I have a file as below chr1 start ref alt code1 code2 chr1 18884 C CAAAA 2 0 chr1 135419 TATACA T 2 0 chr1 332045 T TTG 0 2 chr1 453838 T TAC 2 0 chr1 567652 T TG 1 0 chr1 602541 ... (2 Replies)
Discussion started by: plumb_r
2 Replies

8. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

I have a text file that has three columns. But at the end of the text file, there are trailing lines that have missing second and third columns: 4 0.04972604 KLHL28 4 0.0497332 CSTB 4 0.04979822 AIF1 4 0.04983331 DECR2 4 0.04990344 KATNB1 4 4 4 4 How can I remove the trailing... (3 Replies)
Discussion started by: evelibertine
3 Replies

9. Shell Programming and Scripting

Get values from different columns from file2 when match values of file1

Hi everyone, I have file1 and file2 comma separated both. file1 is: Header1,Header2,Header3,Header4,Header5,Header6,Header7,Header8,Header9,Header10 Code7,,,,,,,,, Code5,,,,,,,,, Code3,,,,,,,,, Code9,,,,,,,,, Code2,,,,,,,,,file2... (17 Replies)
Discussion started by: cgkmal
17 Replies

10. Shell Programming and Scripting

Compare selected columns of two files and print whole line with mismatch

hi! i researched about comparing two columns here and got an answer. but after examining my two files, i found out that the first columns of the two files are not unique with each other. all i want to compare is the 2nd and 3rd column. FILE 1: ABS 456 315 EBS 923 163 JYQ3 654 237 FILE 2:... (1 Reply)
Discussion started by: engr.jay
1 Replies
Login or Register to Ask a Question