awk- comparing fields from the same column, finding discontinuities.

04-14-2011

Registered User

28, 0

Join Date: Apr 2011

Last Activity: 17 August 2011, 4:10 AM EDT

Location: Helsinki, Finland

Posts: 28

Thanks Given: 14

Thanked 0 Times in 0 Posts

awk- comparing fields from the same column, finding discontinuities.

Hello,

I have a file with two fields. The first field repeats itself for quite a while but the second field changes. What I want to do is to go through the first column until its value changes (and while it doesn't, verify that the second field is in a sequence from 0-15).

Example input:

160 13
160 14
160 15
160 0
160 1
160 4 <-- **
160 2 <-- **
409 2
409 3
409 5 <-- **
....

For the output I would like to have a report like:

Channel 160: 2 discontinuities
Channel 409: 1 discontinuity

I only have a quite tangled pseudo-code so far since I don't know how to refer to "the previous field in the same column":

Code:

{channel[$1];

   while ($1=i){

      if($2 < $(previous field, same column) && ($2==0 && $(prev.field,same column) !=15 && $(prevfield,same column) != 0) || $2 > $(prev.field, same column)+1)
discont[i]++;
   }
}

Thanks!

acsg

View Public Profile for acsg

Find all posts by acsg

04-14-2011

Registered User

686, 179

Join Date: Mar 2011

Last Activity: 17 March 2020, 9:58 PM EDT

Posts: 686

Thanks Given: 51

Thanked 179 Times in 171 Posts

How about this...

Code:

awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if((seq++%16)!=$2) #increment and cycle the counter; compare
                 cntr[$1]++
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

Note that the output is gonna be in random order.
To sort them by channel pipe this awk code to sort

Code:

awk '{...}' input | sort -n -k2

To sort by number of discontinuities, sort by fourth field:

Code:

| sort -n -k4

Last edited by mirni; 04-14-2011 at 02:12 PM.. Reason: extra brace

This User Gave Thanks to mirni For This Post:

mirni

View Public Profile for mirni

Find all posts by mirni

04-14-2011

Registered User

14, 5

Join Date: Apr 2011

Last Activity: 25 April 2011, 5:48 AM EDT

Posts: 14

Thanks Given: 0

Thanked 5 Times in 5 Posts

another way

Code:

awk 'BEGIN{ 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1
      C2 = $2

      if ( P1 == C1 ){
        if ( (P2+1)%16 != C2)
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var ": " data_array[var] " discontinuities"
    else
      print "Channel " var ": " data_array[var] " discontinuity"
}'

This User Gave Thanks to |UVI| For This Post:

|UVI|

View Public Profile for |UVI|

Find all posts by |UVI|

04-14-2011

Registered User

28, 0

Join Date: Apr 2011

Last Activity: 17 August 2011, 4:10 AM EDT

Location: Helsinki, Finland

Posts: 28

Thanks Given: 14

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by mirni

How about this...

Code:

awk  '{
         ch!=$1{ch=$1;seq=$2}  #initialize with new channel
         ch==$1{   #channel same as stored
             if((seq++%16)!=$2) #increment and cycle the counter; compare
                 cntr[$1]++
}END{
   for(i in cntr) {
      print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

Note that the output is gonna be in random order.
To sort them by channel pipe this awk code to sort

Code:

awk '{...}' input | sort -n -k2

To sort by number of discontinuities, sort by fourth field:

Code:

| sort -n -k4

It doesn't seem to work, it doesn't print anything... are the first couple of instructions supposed to be wrapped in a BEGIN statement?

---------- Post updated at 11:42 AM ---------- Previous update was at 11:34 AM ----------

Quote:

Originally Posted by |UVI|

another way

Code:

awk 'BEGIN{

Quote:

Originally Posted by |UVI|

Code:

 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1
      C2 = $2

        if ( P1 == C1 ){
        if ( (P2+1)%16 != C2)
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{ 
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var ": " data_array[var] " discontinuities"
    else
      print "Channel " var ": " data_array[var] " discontinuity"
}'

Thank you!!! It works Smilie

.....but there's a tiny problem. I wanted to specify that two consecutive fields having the same value shouldn't be seen as a discontinuity.

So for example

160 13
160 14
160 15
160 0
160 1
160 1
160 1
160 4 <-- **
160 2 <-- **
409 2
409 3
409 5 <-- **

Lines 5, 6, 7 shouldn't be seen as a discontinuity since I have a lot of those in the input file

---------- Post updated at 11:54 AM ---------- Previous update was at 11:42 AM ----------

Quote:

Originally Posted by |UVI|

another way

Code:

awk 'BEGIN{

Quote:

Originally Posted by |UVI|

Code:

 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1
      C2 = $2

        if ( P1 == C1 ){
        if ( (P2+1)%16 != C2)
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{ 
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var ": " data_array[var] " discontinuities"
    else
      print "Channel " var ": " data_array[var] " discontinuity"
}'

Actually, I think it's only counting the number of times a channel is present in the first field... because I have another script that does that and returns the stats, and they're both giving the same results now...

acsg

View Public Profile for acsg

Find all posts by acsg

04-14-2011

Registered User

14, 5

Join Date: Apr 2011

Last Activity: 25 April 2011, 5:48 AM EDT

Posts: 14

Thanks Given: 0

Thanked 5 Times in 5 Posts

Quote:

Originally Posted by acsg

Lines 5, 6, 7 shouldn't be seen as a discontinuity since I have a lot of those in the input file Smilie

Code:

cat input.txt | awk 'BEGIN{ 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1 
      C2 = $2

      if ( P1 == C1 && P2 != C2){
        if ( (P2+1)%16 != C2 )
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var " : "  data_array[var] " discontinuities"
    else
      print "Channel " var " : "  data_array[var] " discontinuity"
}'

now should be works

---------- Post updated at 04:20 AM ---------- Previous update was at 03:58 AM ----------

using this the program prints also channel with 0 discontinuities

Code:

cat input.txt | awk 'BEGIN{ 
  while (getline  > 0 && NF > 0){
    data_array[$1]+=0
    if ( NR > 1 ){
      C1 = $1 
      C2 = $2

      if ( P1 == C1 && P2 != C2){
        if ( (P2+1)%16 != C2 )
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] = 1)
      print "Channel " var " : "  data_array[var] " discontinuity"
    else
      print "Channel " var " : "  data_array[var] " discontinuities"
}'

This User Gave Thanks to |UVI| For This Post:

|UVI|

View Public Profile for |UVI|

Find all posts by |UVI|

04-14-2011

Registered User

28, 0

Join Date: Apr 2011

Last Activity: 17 August 2011, 4:10 AM EDT

Location: Helsinki, Finland

Posts: 28

Thanks Given: 14

Thanked 0 Times in 0 Posts

Quote:

Originally Posted by |UVI|

Code:

cat input.txt | awk 'BEGIN{ 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1 
      C2 = $2

      if ( P1 == C1 && P2 != C2){
        if ( (P2+1)%16 != C2 )
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var " : "  data_array[var] " discontinuities"
    else
      print "Channel " var " : "  data_array[var] " discontinuity"
}'

now should be works

---------- Post updated at 04:20 AM ---------- Previous update was at 03:58 AM ----------

using this the program prints also channel with 0 discontinuities Smilie

Code:

cat input.txt | awk 'BEGIN{

Code:

 
  while (getline  > 0 && NF > 0){
    data_array[$1]+=0
    if ( NR > 1 ){
      C1 = $1 
      C2 = $2

        if ( P1 == C1 && P2 != C2){
        if ( (P2+1)%16 != C2 )
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{ 
  for (var in data_array)
    if ( data_array[var] = 1)
      print "Channel " var " : "  data_array[var] " discontinuity"
    else
      print "Channel " var " : "  data_array[var] " discontinuities"
}'

Thank you so much!! You were extremely helpful.

acsg

View Public Profile for acsg

Find all posts by acsg

04-14-2011

Registered User

686, 179

Join Date: Mar 2011

Last Activity: 17 March 2020, 9:58 PM EDT

Posts: 686

Thanks Given: 51

Thanked 179 Times in 171 Posts

Sorry, I had an extra brace there at the beginning... I fixed the original reply.

mirni

View Public Profile for mirni

Find all posts by mirni

Shell Programming and Scripting

awk- comparing fields from the same column, finding discontinuities.

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Append data with substring of nth column fields using awk

Discussion started by: null7

2. Shell Programming and Scripting

UNIX append field with comparing fields from multiple column

Discussion started by: nike27

3. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Discussion started by: NamS

4. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Discussion started by: aksijain

5. UNIX for Dummies Questions & Answers

Comparing multiple fields from 2 files uing awk

Discussion started by: sshetty

6. UNIX for Dummies Questions & Answers

Compare values of fields from same column with awk

Discussion started by: lucasvs

7. Shell Programming and Scripting

Comparing two csv file fields using awk script

Discussion started by: rajak.net

8. Programming

comparing two fields from two different files in AWK

Discussion started by: amarn

9. Shell Programming and Scripting

finding greatest value in a column using awk from iostat output in linux

Discussion started by: achak01

10. Shell Programming and Scripting

Finding the total of a column using awk

Discussion started by: ironhead3fan