To count distinct fields in a row

09-16-2010

Registered User

5, 0

Join Date: Sep 2010

Last Activity: 22 September 2010, 11:42 PM EDT

Posts: 5

Thanks Given: 3

Thanked 0 Times in 0 Posts

To count distinct fields in a row

I have . dat file which contains data in a specific format:
0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761

i need to compare each field in a row with another field of the same column but different row and cont the diffferences between the rows.
For ex: 892!=921 and 921!=342 and 342!=543 hence the (count of the differences between row 0 and row 1) = 3
Similarly i need to count the differences between the fields of row 1 and row 2
row 2 and row 3.. and soo on(not 0-1 , 2-3, 4-5...)

please can anyone body help me with an awk script?
I used NR.. but it is not pointing back to an already visited row

Abhik

View Public Profile for Abhik

Find all posts by Abhik

09-16-2010

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

Code:

 awk 'NR==1{split($0,a);next}{for (i in a){if (a[i]!=$i && i>1)b++};print a[1]"-"$1": "b;split($0,a);b=0}' file

This User Gave Thanks to bartus11 For This Post:

bartus11

View Public Profile for bartus11

Find all posts by bartus11

09-16-2010

Registered User

5, 0

Join Date: Sep 2010

Last Activity: 22 September 2010, 11:42 PM EDT

Posts: 5

Thanks Given: 3

Thanked 0 Times in 0 Posts

Thanks a ton

Thanks for the solution.. Using ur script, i am getting a wrong difference result when comparing with different length columns...
Can u pls explain the above script as i am new to awk

Abhik

View Public Profile for Abhik

Find all posts by Abhik

09-16-2010

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

So what should be result of comparing those two lines?

Code:

1 3 921 342 543
2 4 817 562 718 765

bartus11

View Public Profile for bartus11

Find all posts by bartus11

09-16-2010

Registered User

5, 0

Join Date: Sep 2010

Last Activity: 22 September 2010, 11:42 PM EDT

Posts: 5

Thanks Given: 3

Thanked 0 Times in 0 Posts

as 921!=817 and 342!=562..
difference is 4

Abhik

View Public Profile for Abhik

Find all posts by Abhik

09-16-2010

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

perl may help you some

Code:

my $tmp;
sub _comp(@@){
  my $cnt;
  my @a = @{$_[0]};
  my @b = @{$_[1]};
  for (my $i=2;$i<=$#a;$i++){
   $cnt++ if $a[$i] != $b[$i];
  }
  return $cnt;
}
while(<DATA>){
  if($.==1){
    $tmp=$_;
  }
  else{
    my @arr1=split /\s+/, $tmp;
    my @arr2=split /\s+/, $_;
    my $diff = _comp(\@arr1,\@arr2);
    my ($a,$b)=($.-2,$.-1);
    print "Diff between line [$a] and line [$b] is $diff\n";
    $tmp=$_;
  }
}
__DATA__
0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761
3 3 829

This User Gave Thanks to summer_cherry For This Post:

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

09-17-2010

Registered User

2,759, 420

Join Date: Jun 2006

Last Activity: 13 September 2015, 8:58 PM EDT

Posts: 2,759

Thanks Given: 44

Thanked 420 Times in 408 Posts

above perl script's result has issue:

Code:

Diff between line [0] and line [1] is 3
Diff between line [1] and line [2] is 3
Diff between line [2] and line [3] is 3
Diff between line [3] and line [4] is 4

Here is mine:

Code:

cat infile

0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761
3 3 829


awk '
NR==1{split($0,a);c=NF;next}
{ s=(c>NF)?c-NF:"0";}
{ for (i=3;i<=NF;i++) if (a[i]!=$i) b++ }
{print "Differences between Row", NR-1, "and Row",NR,")=",b+s;split($0,a);b=0;c=NF}
' infile

Differences between Row 1 and Row 2 )= 3
Differences between Row 2 and Row 3 )= 4
Differences between Row 3 and Row 4 )= 3
Differences between Row 4 and Row 5 )= 4

This User Gave Thanks to rdcwayx For This Post:

rdcwayx

View Public Profile for rdcwayx

Find all posts by rdcwayx

Shell Programming and Scripting

To count distinct fields in a row

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Analyzing last 2 fields of 1 row and 3rd field of next row

Discussion started by: ncwxpanther

2. Shell Programming and Scripting

Help with Getting distinct record count from a .dat file using UNIX command

Discussion started by: MS06

3. Shell Programming and Scripting

How to find DISTINCT rows and combine in one row?

Discussion started by: OTNA

4. UNIX for Dummies Questions & Answers

count number of distinct values in each column with awk

Discussion started by: beca123456

5. Shell Programming and Scripting

distinct values of all the fields

Discussion started by: vukkusila

6. UNIX for Dummies Questions & Answers

distinct values of all the fields

Discussion started by: vukkusila

7. UNIX for Dummies Questions & Answers

Select Distinct on multiple fields

Discussion started by: Refresher

8. Shell Programming and Scripting

Getting Sum, Count and Distinct Count of a file

Discussion started by: singhabhijit

9. UNIX for Advanced & Expert Users

Count the distinct list of ids

Discussion started by: jingi1234

10. UNIX for Dummies Questions & Answers

select distinct row from a file

Discussion started by: merry susana