awk- comparing fields from the same column, finding discontinuities.

04-15-2011

Registered User

28, 0

Join Date: Apr 2011

Last Activity: 17 August 2011, 4:10 AM EDT

Location: Helsinki, Finland

Posts: 28

Thanks Given: 14

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by mirni

How about this...

Code:

awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if((seq++%16)!=$2) #increment and cycle the counter; compare
                 cntr[$1]++
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

Note that the output is gonna be in random order.
To sort them by channel pipe this awk code to sort

Code:

awk '{...}' input | sort -n -k2

To sort by number of discontinuities, sort by fourth field:

Code:

| sort -n -k4

I tried the new code with this input:

160 1
160 2
160 3
160 4
160 6 <-- **
160 7
160 8
160 9
160 10
160 10
160 11
160 12
160 13
160 14
160 15
160 0
160 15 <-- **
160 0
162 1
162 2
162 4 <-- **
162 6 <-- **
162 7
162 8

and I got this output:

Channel 160 has 13 discontinuities
Channel 162 has 4 discontinuities

Normally there should be 2 discontinuities in channel 160 and 2 in channel 162. Is there an if statement missing? where we check if the channel is the same as stored?

I re-checked the other code (the one provided by UVI ) again (with this smaller input file) and it doesn't work properly, so I still have the same problem Smilie

acsg

View Public Profile for acsg

Find all posts by acsg

04-15-2011

Registered User

686, 179

Join Date: Mar 2011

Last Activity: 17 March 2020, 9:58 PM EDT

Posts: 686

Thanks Given: 51

Thanked 179 Times in 171 Posts

Quote:

two consecutive fields having the same value shouldn't be seen as a discontinuity.

<--- That was missing. Test this out:

Code:

awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if(seq==$2) next;  #if same, skip to next line
             else if((++seq%16)!=$2) { #increment and cycle the counter; compare
                 cntr[$1]++
                 seq=$2     #reset seq
                 #print "Disc. " $0   #debug; uncomment to check what was grabbed
             }
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

This User Gave Thanks to mirni For This Post:

mirni

View Public Profile for mirni

Find all posts by mirni

04-19-2011

Registered User

28, 0

Join Date: Apr 2011

Last Activity: 17 August 2011, 4:10 AM EDT

Location: Helsinki, Finland

Posts: 28

Thanks Given: 14

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by mirni

<--- That was missing. Test this out:

Quote:

Originally Posted by mirni

Code:

awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if(seq==$2) next;  #if same, skip to next line
             else if((++seq%16)!=$2) { #increment and cycle the counter; compare
                 cntr[$1]++
                 seq=$2     #reset seq
                 #print "Disc. " $0   #debug; uncomment to check what was grabbed
             }
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

Hi mirni,

Thanks for the reply. It works well overall but the problem is that it seems to detect every second repeated number as a discontinuity as well, so say that this is the input (if the sequence was from 0-3 instead of 0-15):

400 0
400 1 xx
400 1 xx
400 2
400 3
400 0 //
400 0 //
400 1
400 2
400 3 xx
400 3 xx
400 0
400 1 //
400 1 //

The places marked with // are viewed as a discontinuity. I think there's a problem with the storage of "seq" right after detecting a repeated number, but i haven't been able to fixt it.

I've attached the input file I'm using to test it, and the result of the discontinuities is:

Channel: 400
Discontinuities: 2

Discontinuities
===========
Line number: 25
400 6

Line number: 52
400 15

Thanks a lot for your time and help.

Discont_test.txt (478 Bytes)

acsg

View Public Profile for acsg

Find all posts by acsg

04-19-2011

Registered User

14, 5

Join Date: Apr 2011

Last Activity: 25 April 2011, 5:48 AM EDT

Posts: 14

Thanks Given: 0

Thanked 5 Times in 5 Posts

Code:

cat input.txt | awk 'BEGIN{ 
  while (getline  > 0 && NF > 0){
      data_array[$1]+=0
      if ( NR > 1 ){
        C1 = $1 
        C2 = $2

        if ( P1 == C1 && P2 != C2){
            if ( (P2+1)%16 != C2 ){
              data_array[$1]++
            }
        }
      }
      P1 = $1
      P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] == 1)
      print "Channel " var " : "  data_array[var] " discontinuity"
    else
      print "Channel " var " : "  data_array[var] " discontinuities"
}'

there was an error on assignement!
Now works correctly

This User Gave Thanks to |UVI| For This Post:

|UVI|

View Public Profile for |UVI|

Find all posts by |UVI|

04-19-2011

Registered User

686, 179

Join Date: Mar 2011

Last Activity: 17 March 2020, 9:58 PM EDT

Posts: 686

Thanks Given: 51

Thanked 179 Times in 171 Posts

I don't see what is it doing wrong. Data:

Code:

$ cat d2
400 0
400 1 xx
400 1 xxxx
400 2
400 3
400 0 //
400 0 ////
400 1
400 2
400 3 xx
400 3 xxxx
400 0
400 1 //
400 1 ////

Script:

Code:

$ cat test.sh
#!/bin/sh

awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if(seq==$2) next;  #if same, skip to next line
             else if((++seq%16)!=$2) { #increment and cycle the counter; compare
                 cntr[$1]++
                 seq=$2     #reset seq
                 print "Disc. " $0 " seq: " seq   #debug; uncomment to check what was grabbed
             }
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' < $1

Run:

Code:

$ ./test.sh d2 
Disc. 400 0 // seq: 0
Disc. 400 0 seq: 0
Channel 400 has 2 discontinuities

Isn't that the desired output?

mirni

View Public Profile for mirni

Find all posts by mirni

04-21-2011

Registered User

1,305, 26

Join Date: Jun 2007

Last Activity: 11 November 2016, 3:44 AM EST

Location: Beijing China

Posts: 1,305

Thanks Given: 0

Thanked 26 Times in 26 Posts

try below per

Code:

my $pre;
while(<DATA>){
	my @tmp = split;
	push @{$hash{$tmp[0]}}, $tmp[1];
}
foreach my $key(keys %hash){
	my $cnt;
	my @arr = @{$hash{$key}};
	for(my $i=1;$i<=$#arr;$i++){
		#$cnt ++ if $arr[$i-1]<$arr[$i] && $arr[$i]>($arr[$i+1]||-10000);
		$cnt++ if not (($arr[$i-1]==$arr[$i]-1) || $arr[$i]==$arr[$i+1]-1);
	}
	print $key," has ", $cnt, " incontinuity\n" if $cnt;
}
__DATA__
160 13
160 14
160 15
160 0
160 1
160 4
160 2
409 2
409 3
409 5

summer_cherry

View Public Profile for summer_cherry

Find all posts by summer_cherry

04-21-2011

Registered User

28, 0

Join Date: Apr 2011

Last Activity: 17 August 2011, 4:10 AM EDT

Location: Helsinki, Finland

Posts: 28

Thanks Given: 14

Thanked 0 Times in 0 Posts

It's working now, thank you so much for your help and time mirni, |UVI| and summer_cherry

acsg

View Public Profile for acsg

Find all posts by acsg

Shell Programming and Scripting

awk- comparing fields from the same column, finding discontinuities.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Append data with substring of nth column fields using awk

Discussion started by: null7

2. Shell Programming and Scripting

UNIX append field with comparing fields from multiple column

Discussion started by: nike27

3. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Discussion started by: NamS

4. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Discussion started by: aksijain

5. UNIX for Dummies Questions & Answers

Comparing multiple fields from 2 files uing awk

Discussion started by: sshetty

6. UNIX for Dummies Questions & Answers

Compare values of fields from same column with awk

Discussion started by: lucasvs

7. Shell Programming and Scripting

Comparing two csv file fields using awk script

Discussion started by: rajak.net

8. Programming

comparing two fields from two different files in AWK

Discussion started by: amarn

9. Shell Programming and Scripting

finding greatest value in a column using awk from iostat output in linux

Discussion started by: achak01

10. Shell Programming and Scripting

Finding the total of a column using awk

Discussion started by: ironhead3fan