To count distinct fields in a row


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting To count distinct fields in a row
# 1  
Old 09-16-2010
To count distinct fields in a row

I have . dat file which contains data in a specific format:
0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761

i need to compare each field in a row with another field of the same column but different row and cont the diffferences between the rows.
For ex: 892!=921 and 921!=342 and 342!=543 hence the (count of the differences between row 0 and row 1) = 3
Similarly i need to count the differences between the fields of row 1 and row 2
row 2 and row 3.. and soo on(not 0-1 , 2-3, 4-5...)

please can anyone body help me with an awk script?
I used NR.. but it is not pointing back to an already visited row Smilie
# 2  
Old 09-16-2010
Code:
 awk 'NR==1{split($0,a);next}{for (i in a){if (a[i]!=$i && i>1)b++};print a[1]"-"$1": "b;split($0,a);b=0}' file

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 09-16-2010
Thanks a ton

Thanks for the solution.. Using ur script, i am getting a wrong difference result when comparing with different length columns...
Can u pls explain the above script as i am new to awk
# 4  
Old 09-16-2010
So what should be result of comparing those two lines?
Code:
1 3 921 342 543
2 4 817 562 718 765

# 5  
Old 09-16-2010
as 921!=817 and 342!=562..
difference is 4
# 6  
Old 09-16-2010
perl may help you some

Code:
my $tmp;
sub _comp(@@){
  my $cnt;
  my @a = @{$_[0]};
  my @b = @{$_[1]};
  for (my $i=2;$i<=$#a;$i++){
   $cnt++ if $a[$i] != $b[$i];
  }
  return $cnt;
}
while(<DATA>){
  if($.==1){
    $tmp=$_;
  }
  else{
    my @arr1=split /\s+/, $tmp;
    my @arr2=split /\s+/, $_;
    my $diff = _comp(\@arr1,\@arr2);
    my ($a,$b)=($.-2,$.-1);
    print "Diff between line [$a] and line [$b] is $diff\n";
    $tmp=$_;
  }
}
__DATA__
0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761
3 3 829

This User Gave Thanks to summer_cherry For This Post:
# 7  
Old 09-17-2010
above perl script's result has issue:

Code:
Diff between line [0] and line [1] is 3
Diff between line [1] and line [2] is 3
Diff between line [2] and line [3] is 3
Diff between line [3] and line [4] is 4

Here is mine:

Code:
cat infile

0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761
3 3 829


awk '
NR==1{split($0,a);c=NF;next}
{ s=(c>NF)?c-NF:"0";}
{ for (i=3;i<=NF;i++) if (a[i]!=$i) b++ }
{print "Differences between Row", NR-1, "and Row",NR,")=",b+s;split($0,a);b=0;c=NF}
' infile

Differences between Row 1 and Row 2 )= 3
Differences between Row 2 and Row 3 )= 4
Differences between Row 3 and Row 4 )= 3
Differences between Row 4 and Row 5 )= 4

This User Gave Thanks to rdcwayx For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Analyzing last 2 fields of 1 row and 3rd field of next row

I have the following script that will average the last two fields of each row, but im not sure how to include the 3rd field of the following row. An example of the analysis that I need to perform from the input - (66.61+58.01+54.16)/3 awk '{sum=cnt=0; for (i=13;i<=NF;i++) { sum+=$i; cnt++... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

2. Shell Programming and Scripting

Help with Getting distinct record count from a .dat file using UNIX command

Hi, I have a .dat file with contents like the below: Input file ============SEQ NO-1: COLUMN1========== 9835619 7152815 ============SEQ NO-2: COLUMN2 ========== 7615348 7015548 9373086 ============SEQ NO-3: COLUMN3=========== 9373086 Expected Output: (I just... (1 Reply)
Discussion started by: MS06
1 Replies

3. Shell Programming and Scripting

How to find DISTINCT rows and combine in one row?

Hi , i need to display only one of duplicated values and merged them in one record only when tag started with 3100.2.128.8 3100.2.97.1=192.168.0.12 3100.2.128.8=418/66/03e9/0044801 3100.2.128.8=418/66/03ea/0044601 3100.2.128.8=418/66/03e9/0044801 3100.2.128.8=418/66/03ea/0044601... (5 Replies)
Discussion started by: OTNA
5 Replies

4. UNIX for Dummies Questions & Answers

count number of distinct values in each column with awk

Hi ! input: A|B|C|D A|F|C|E A|B|I|C A|T|I|B As the title of the thread says, I would need to get: 1|3|2|4 I tried different variants of this command, but I don't manage to obtain what I need: gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input ... (2 Replies)
Discussion started by: beca123456
2 Replies

5. Shell Programming and Scripting

distinct values of all the fields

I am a beginner to scripting, please help me in this regard. How do I create a script that provides a count of distinct values of all the fields in the pipe delimited file ? I have 20 different files with multiple columns in each file. I needed to write a generic script where I give the number... (2 Replies)
Discussion started by: vukkusila
2 Replies

6. UNIX for Dummies Questions & Answers

distinct values of all the fields

I am a beginner to scripting, please help me in this regard. How do I create a script that provides a count of distinct values of all the fields in the pipe delimited file ? I have 20 different files with multiple columns in each file. I needed to write a generic script where I give the number... (1 Reply)
Discussion started by: vukkusila
1 Replies

7. UNIX for Dummies Questions & Answers

Select Distinct on multiple fields

How do I create a script that provides a count of distinct values of a particular field in a file utilizing commonly available UNIX commands (sh or awk)? Field1|Field2|Field3|Field4 AAA|BBB|CCC|DDD 111|222|333|777 AAA|EEE|ZZZ|EEE 111|555|333|444 AAA|EEE|CCC|DDD 111|222|555|444 For... (2 Replies)
Discussion started by: Refresher
2 Replies

8. Shell Programming and Scripting

Getting Sum, Count and Distinct Count of a file

Hi all this is a UNIX question. I have a large flat file with millions of records. col1|col2|col3 1|a|b 2|c|d 3|e|f 3|g|h footer**** I am supposed to calculate the sum of col1 1+2+3+3=9, count of col1 1,2,3,3=4, and distinct count of col1 1,2,3=c3 I would like it if you avoid... (4 Replies)
Discussion started by: singhabhijit
4 Replies

9. UNIX for Advanced & Expert Users

Count the distinct list of ids

Hello guys, I have a file in the following format(each line seperated by TAB): ========= Filename id Filename id1 Filename id Filename1 id7 Filename1 id7 Filename2 id1 Filename2 id1 Filename2 id3 Filename3 id2 Filename3 id4 Filename3 id4 Filename3 id6 ========= I would like to... (2 Replies)
Discussion started by: jingi1234
2 Replies

10. UNIX for Dummies Questions & Answers

select distinct row from a file

Hi, buddies out there. I have a text file ( only one column ) which I created using vi editor. The file contains duplicate rows and I would like to select distinct rows, how to go on it using unix command: file content = apple apple orange watermelon apple orange Can it be done... (7 Replies)
Discussion started by: merry susana
7 Replies
Login or Register to Ask a Question