awk- comparing fields from the same column, finding discontinuities.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk- comparing fields from the same column, finding discontinuities.
# 8  
Old 04-15-2011
Quote:
Originally Posted by mirni
How about this...
Code:
awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if((seq++%16)!=$2) #increment and cycle the counter; compare
                 cntr[$1]++
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

Note that the output is gonna be in random order.
To sort them by channel pipe this awk code to sort
Code:
awk '{...}' input | sort -n -k2

To sort by number of discontinuities, sort by fourth field:
Code:
| sort -n -k4



I tried the new code with this input:

160 1
160 2
160 3
160 4
160 6 <-- **
160 7
160 8
160 9
160 10
160 10
160 11
160 12
160 13
160 14
160 15
160 0
160 15 <-- **
160 0
162 1
162 2
162 4 <-- **
162 6 <-- **
162 7
162 8

and I got this output:

Channel 160 has 13 discontinuities
Channel 162 has 4 discontinuities


Normally there should be 2 discontinuities in channel 160 and 2 in channel 162. Is there an if statement missing? where we check if the channel is the same as stored?

I re-checked the other code (the one provided by UVI ) again (with this smaller input file) and it doesn't work properly, so I still have the same problem Smilie
# 9  
Old 04-15-2011
Quote:
two consecutive fields having the same value shouldn't be seen as a discontinuity.
<--- That was missing. Test this out:
Code:
awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if(seq==$2) next;  #if same, skip to next line
             else if((++seq%16)!=$2) { #increment and cycle the counter; compare
                 cntr[$1]++
                 seq=$2     #reset seq
                 #print "Disc. " $0   #debug; uncomment to check what was grabbed
             }
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

This User Gave Thanks to mirni For This Post:
# 10  
Old 04-19-2011
Quote:
Originally Posted by mirni
<--- That was missing. Test this out:
Quote:
Originally Posted by mirni
Code:
awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if(seq==$2) next;  #if same, skip to next line
             else if((++seq%16)!=$2) { #increment and cycle the counter; compare
                 cntr[$1]++
                 seq=$2     #reset seq
                 #print "Disc. " $0   #debug; uncomment to check what was grabbed
             }
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input



Hi mirni,

Thanks for the reply. It works well overall but the problem is that it seems to detect every second repeated number as a discontinuity as well, so say that this is the input (if the sequence was from 0-3 instead of 0-15):

400 0
400 1 xx
400 1 xx
400 2
400 3
400 0 //
400 0 //
400 1
400 2
400 3 xx
400 3 xx
400 0
400 1 //
400 1 //

The places marked with // are viewed as a discontinuity. I think there's a problem with the storage of "seq" right after detecting a repeated number, but i haven't been able to fixt it.

I've attached the input file I'm using to test it, and the result of the discontinuities is:

Channel: 400
Discontinuities: 2

Discontinuities
===========
Line number: 25
400 6

Line number: 52
400 15

Thanks a lot for your time and help.
# 11  
Old 04-19-2011
Code:
cat input.txt | awk 'BEGIN{ 
  while (getline  > 0 && NF > 0){
      data_array[$1]+=0
      if ( NR > 1 ){
        C1 = $1 
        C2 = $2

        if ( P1 == C1 && P2 != C2){
            if ( (P2+1)%16 != C2 ){
              data_array[$1]++
            }
        }
      }
      P1 = $1
      P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] == 1)
      print "Channel " var " : "  data_array[var] " discontinuity"
    else
      print "Channel " var " : "  data_array[var] " discontinuities"
}'

there was an error on assignement!
Now works correctly
This User Gave Thanks to |UVI| For This Post:
# 12  
Old 04-19-2011
I don't see what is it doing wrong. Data:
Code:
$ cat d2
400 0
400 1 xx
400 1 xxxx
400 2
400 3
400 0 //
400 0 ////
400 1
400 2
400 3 xx
400 3 xxxx
400 0
400 1 //
400 1 ////

Script:
Code:
$ cat test.sh
#!/bin/sh

awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if(seq==$2) next;  #if same, skip to next line
             else if((++seq%16)!=$2) { #increment and cycle the counter; compare
                 cntr[$1]++
                 seq=$2     #reset seq
                 print "Disc. " $0 " seq: " seq   #debug; uncomment to check what was grabbed
             }
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' < $1

Run:
Code:
$ ./test.sh d2 
Disc. 400 0 // seq: 0
Disc. 400 0 seq: 0
Channel 400 has 2 discontinuities

Isn't that the desired output?
# 13  
Old 04-21-2011
try below per

Code:
my $pre;
while(<DATA>){
	my @tmp = split;
	push @{$hash{$tmp[0]}}, $tmp[1];
}
foreach my $key(keys %hash){
	my $cnt;
	my @arr = @{$hash{$key}};
	for(my $i=1;$i<=$#arr;$i++){
		#$cnt ++ if $arr[$i-1]<$arr[$i] && $arr[$i]>($arr[$i+1]||-10000);
		$cnt++ if not (($arr[$i-1]==$arr[$i]-1) || $arr[$i]==$arr[$i+1]-1);
	}
	print $key," has ", $cnt, " incontinuity\n" if $cnt;
}
__DATA__
160 13
160 14
160 15
160 0
160 1
160 4
160 2
409 2
409 3
409 5

# 14  
Old 04-21-2011
It's working now, thank you so much for your help and time mirni, |UVI| and summer_cherry Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Append data with substring of nth column fields using awk

Hi guys, I have problem to append new data at the end of each line of the files where it takes whole value of the nth column. My expected result i just want to take a specific value only. This new data is based on substring of 11th, 12th 13th column that has comma seperated value. My code: awk... (4 Replies)
Discussion started by: null7
4 Replies

2. Shell Programming and Scripting

UNIX append field with comparing fields from multiple column

I have a csv dump from sql server that needs to be converted so it can be feed to another program. I already sorted on field 1 but there are multiple columns with same field 1 where it needs to be compared against and if it is same then append field 5. i.e from ANG SJ,0,B,LC22,LC22(0) BAT... (2 Replies)
Discussion started by: nike27
2 Replies

3. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Dear All, I have 2 files. If field 1, 2, 4 and 5 matches in both file1 and file2, I want to print the whole line of file1 and file2 one after another in my output file. File1: sc2/80 20 . A T 86 F=5;U=4 sc2/60 55 . G T ... (1 Reply)
Discussion started by: NamS
1 Replies

4. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

5. UNIX for Dummies Questions & Answers

Comparing multiple fields from 2 files uing awk

Hi I have 2 files as below File 1 Chr Start End chr1 120 130 chr1 140 150 chr2 130 140 File2 Chr Start End Value chr1 121 128 ABC chr1 144 149 XYZ chr2 120 129 PQR I would like to compare these files using awk; specifically if column 1 of file1 is equal to column 1 of file2... (7 Replies)
Discussion started by: sshetty
7 Replies

6. UNIX for Dummies Questions & Answers

Compare values of fields from same column with awk

Hi all ! If there is only one single value in a column (e.g. column 1 below), then return this value in the same output column. If there are several values in the same column (e.g. column 2 below), then return the different values separated by "," in the output. pipe-separated input: ... (11 Replies)
Discussion started by: lucasvs
11 Replies

7. Shell Programming and Scripting

Comparing two csv file fields using awk script

Hi All, I want to remove the rows from File1.csv by comparing the columns/fields in the File2.csv. I only need the records whose first column is same and the second column is different for the same record in both files.Here is an example on what I need. File1.csv: RAJAK|ACTIVE|1... (2 Replies)
Discussion started by: rajak.net
2 Replies

8. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

9. Shell Programming and Scripting

finding greatest value in a column using awk from iostat output in linux

Friends, . On linux i have to run iostat command and in each iteration have to print the greatest value in each column. e.g iostat -dt -kx 2 2 | awk ' !/sd/ &&!/%util/ && !/Time/ && !/Linux/ {print $12}' 4.38 0.00 0.00 0.00 What i would like to print is only the... (3 Replies)
Discussion started by: achak01
3 Replies

10. Shell Programming and Scripting

Finding the total of a column using awk

Here is my file name countries USSR 8650 262 Asia Canada 3852 24 North America China 3692 866 Asia USA 3615 219 North America Brazil 3286 116 South America India 1269 637 Asia Argentina 1072 ... (8 Replies)
Discussion started by: ironhead3fan
8 Replies
Login or Register to Ask a Question