Reporting where values match and mismatch across three columns

03-26-2015

Registered User

13, 0

Join Date: Feb 2012

Last Activity: 1 April 2015, 4:03 PM EDT

Posts: 13

Thanks Given: 11

Thanked 0 Times in 0 Posts

Reporting where values match and mismatch across three columns

Hello,
I have values in three columns. Some values occur in all three columns, other values are present in only one or two columns. I would like to be able to see where the matches and mismatches occur. Thanks in advance for any advice!

I have:

Code:

A     B     C
1     1     2
2     3     3
3     4     5
4            6
5

I would like:

Code:

A     B     C
1     1      X
2     X      2
3     3      3
4     4      X
5     X      5
X     X      6

Last edited by Scrutinizer; 03-26-2015 at 04:19 PM..

MDeBiasse

View Public Profile for MDeBiasse

Find all posts by MDeBiasse

03-26-2015

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello MDeBiasse,

If I caught your logic correctly then last line should be X 6 XFollowing may help you in same. Please let me know if this helps.

Code:

awk -vs1=`cat Input_file | wc -l` 'BEGIN{print "A" OFS "B" OFS "C"} FNR==NR && NR>1{A[$1];B[$2];C[$3];next} END{o=1;for(k=1;k<=s1;k++){S=(k in A)?o:"X";U=(k in B)?o:"X";V=(k in C)?o:"X";print S OFS U OFS V;S=U=V="";o++}}' Input_file Input_file

Where input_file is as follows:

Code:

Output will be as follows.

Code:

A B C
1 1 X
2 X 2
3 3 3
4 4 X
5 X 5
X 6 X

EDIT: Seems moderator has modified post now by adding code tags, thank you for same. Then Following is the input file per user.
Input_file:

Code:

cat testtest13
A     B     C
1     1     2
2     3     3
3     4     5
4           6
5

Then following code may help.

Code:

awk -vs1=`cat testtest13 | wc -l` -F"     " 'BEGIN{print "A" OFS "B" OFS "C"} FNR==NR && NR>1{sub(/[[:space:]]/,X,$1);sub(/[[:space:]]/,X,$2);sub(/[[:space:]]/,X,$3);A[$1];B[$2];C[$3];next} END{o=1;for(k=1;k<=s1;k++){S=(k in A)?o:"X";U=(k in B)?o:"X";V=(k in C)?o:"X";print S OFS U OFS V;S=U=V="";o++}}' testtest13 testtest13

Output will be as follows.

Code:

A B C
1 1 X
2 X 2
3 3 3
4 4 X
5 X 5
X X 6

EDIT: Adding a non one liner form for solution following.

Code:

awk -vs1=`cat testtest13 | wc -l` -F"     " 'BEGIN                      {print "A" OFS "B" OFS "C"}
                                             FNR==NR && NR>1            {sub(/[[:space:]]/,X,$1);
                                                                         sub(/[[:space:]]/,X,$2);
                                                                         sub(/[[:space:]]/,X,$3);
                                                                         A[$1];
                                                                         B[$2];
                                                                         C[$3];
                                                                         next
                                                                        }
                                             END                        {o=1;
                                                                                for(k=1;k<=s1;k++){
                                                                                                        S=(k in A)?o:"X";
                                                                                                        U=(k in B)?o:"X";
                                                                                                        V=(k in C)?o:"X";
                                                                                                        print S OFS U OFS V;
                                                                                                        S=U=V="";
                                                                                                        o++
                                                                                                  }
                                                                        }
                                             ' testtest13 testtest13

Thanks,
R. Singh

Last edited by RavinderSingh13; 03-26-2015 at 04:43 PM.. Reason: Added a new solution seems after adding code tags by moderator input became a little different so added solution accordingly

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

03-26-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try (untested):

Code:

awk  '            {A[$1]=$1; B[$2]=$2; C[$3]=$3; for (i=1; i<=3; i++) D[$i]}
         END     {for (d in D) print A[d]+0, B[d]+0, C[d]+0}
        ' file

Replace 0 s with X s and sort, if need be.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

03-26-2015

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello MDeBiasse,

My solution in post#2 will not look for the highest number in all columns it will look for the number of lines in input file, now let's say our input file is as follows.
Input_file:

Code:

cat testtest13
A     B     C
1     1     12
2     3     3
3     14     5
4           6
5

Now following code may help.

Code:

awk -F"     " 'BEGIN{print "A" OFS "B" OFS "C"} FNR==NR && NR>1{k=0;for(i=1;i<=NF;i++){k=k<$i?$i:k};q=q<k?k:q;sub(/[[:space:]]/,X,$1);sub(/[[:space:]]/,X,$2);sub(/[[:space:]]/,X,$3);A[$1];B[$2];C[$3];next} END{o=1;for(k=1;k<=q;k++){S=(k in A)?o:"X";U=(k in B)?o:"X";V=(k in C)?o:"X";print S OFS U OFS V;S=U=V="";o++}}' testtest13 testtest13

Output will be as follows.

Code:

A B C
1 1 X
2 X X
3 3 3
4 X X
5 X 5
X X 6
X X X
X X X
X X X
X X X
X X X
X X 12
X X X
X 14 X

Hope this helps, will be glad if this works as per your expectations.

Thanks,
R. Singh

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

03-26-2015

Registered User

2,100, 402

Join Date: Apr 2009

Last Activity: 11 February 2020, 10:24 AM EST

Posts: 2,100

Thanks Given: 26

Thanked 402 Times in 360 Posts

If the delimiter is a Tab character, then:

Code:

$
$ cat -t data1.txt
A^IB^IC
1^I1^I2
2^I3^I3
3^I4^I5
4^I^I6
5
$
$ perl -lne '@tokens = split/\t/;
             if ($.==1) { @cols = @tokens; next }
             foreach $i (0..$#tokens) {
                 next if $tokens[$i] eq "";
                 if (not defined $occurs{$tokens[$i]}) {
                     $occurs{$tokens[$i]} = [ "X", "X", "X" ]
                 }
                 $occurs{$tokens[$i]}->[$i] = $tokens[$i]
             }
             END {
                 print join " ", @cols;
                 foreach $k (sort keys %occurs) {
                     print join " ", @{$occurs{$k}}
                 }
             }
            ' data1.txt
A B C
1 1 X
2 X 2
3 3 3
4 4 X
5 X 5
X X 6
$
$

Otherwise, if the delimiter is one-or-more-blank-spaces and the file is fixed-format (i.e. the "6" in the penultimate line is in the same column as "C" in the first line), then:

Code:

$
$ cat -t data2.txt
A     B       C
1     1       2
2     3       3
3     4       5
4             6
5
$
$
$ perl -lne 'if ($.==1) {
                 @cols = /^(\S+)\s+(\S+)\s+(\S+)/;
                 $template = sprintf("A%d A%d A*", $-[2]-$-[1], $-[3]-$-[2]);
                 next;
             }
             @tokens = unpack($template, $_);
             foreach $i (0..$#tokens) {
                 next if $tokens[$i] eq "";
                 if (not defined $occurs{$tokens[$i]}) {
                     $occurs{$tokens[$i]} = [ "X", "X", "X" ]
                 }
                 $occurs{$tokens[$i]}->[$i] = $tokens[$i]
             }
             END {
                 print join " ", @cols;
                 foreach $k (sort keys %occurs) {
                     print join " ", @{$occurs{$k}}
                 }
             }
            ' data2.txt
A B C
1 1 X
2 X 2
3 3 3
4 4 X
5 X 5
X X 6
$
$

This User Gave Thanks to durden_tyler For This Post:

durden_tyler

View Public Profile for durden_tyler

Find all posts by durden_tyler

03-27-2015

Registered User

13, 0

Join Date: Feb 2012

Last Activity: 1 April 2015, 4:03 PM EDT

Posts: 13

Thanks Given: 11

Thanked 0 Times in 0 Posts

Hello RudiC. I found your script to work well. However, when I change the 0's to X's, the output remains with 0's. Do you have any suggestions on how to fix this? Thank you!

MDeBiasse

View Public Profile for MDeBiasse

Find all posts by MDeBiasse

03-27-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try

Code:

awk     'NR==1          {print; next}
                        {A[$1]=$1; B[$2]=$2; C[$3]=$3; for (i=1; i<=3; i++) if ($i) D[$i]}
         END            {for (d in D)   {TMP=sprintf ("%s\t%s\t%s", A[d]+0, B[d]+0, C[d]+0)
                                         gsub (/0/, "X", TMP)
                                         print TMP
                                        }
                        }
        ' FS="\t" file
A    B    C
1    1    X
2    X    2
3    3    3
4    4    X
5    X    5
X    X    6

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

Reporting where values match and mismatch across three columns

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Discussion started by: axis88

2. Shell Programming and Scripting

Match value in two files and replace values in selected columns

Discussion started by: jiam912

3. Shell Programming and Scripting

awk to output match and mismatch with count using specific fields

Discussion started by: cmccabe

4. AIX

Compare two files and show the mismatch columns

Discussion started by: sabzR

5. Shell Programming and Scripting

Request: How to Parse dynamic SQL query to pad extra columns to match the fixed number of columns

Discussion started by: vikas_trl

6. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns match on two rows

Discussion started by: mtucker6784

7. Shell Programming and Scripting

Adding columns with values dependent on existing columns

Discussion started by: plumb_r

8. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

Discussion started by: evelibertine

9. Shell Programming and Scripting

Get values from different columns from file2 when match values of file1

Discussion started by: cgkmal

10. Shell Programming and Scripting

Compare selected columns of two files and print whole line with mismatch

Discussion started by: engr.jay