awk- comparing fields from the same column, finding discontinuities.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk- comparing fields from the same column, finding discontinuities.
# 1  
Old 04-14-2011
awk- comparing fields from the same column, finding discontinuities.

Hello,

I have a file with two fields. The first field repeats itself for quite a while but the second field changes. What I want to do is to go through the first column until its value changes (and while it doesn't, verify that the second field is in a sequence from 0-15).

Example input:

160 13
160 14
160 15
160 0
160 1
160 4 <-- **
160 2 <-- **
409 2
409 3
409 5 <-- **
....

For the output I would like to have a report like:

Channel 160: 2 discontinuities
Channel 409: 1 discontinuity

I only have a quite tangled pseudo-code so far since I don't know how to refer to "the previous field in the same column":

Code:
{channel[$1];

   while ($1=i){

      if($2 < $(previous field, same column) && ($2==0 && $(prev.field,same column) !=15 && $(prevfield,same column) != 0) || $2 > $(prev.field, same column)+1)
discont[i]++;
   }
}

Thanks!
# 2  
Old 04-14-2011
How about this...
Code:
awk  '
   ch!=$1{ch=$1;seq=$2}  #initialize with new channel
   ch==$1{   #channel same as stored
             if((seq++%16)!=$2) #increment and cycle the counter; compare
                 cntr[$1]++
   }END{
     for(i in cntr) {
       print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

Note that the output is gonna be in random order.
To sort them by channel pipe this awk code to sort
Code:
awk '{...}' input | sort -n -k2

To sort by number of discontinuities, sort by fourth field:
Code:
| sort -n -k4


Last edited by mirni; 04-14-2011 at 02:12 PM.. Reason: extra brace
This User Gave Thanks to mirni For This Post:
# 3  
Old 04-14-2011
another way

Code:
awk 'BEGIN{ 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1
      C2 = $2

      if ( P1 == C1 ){
        if ( (P2+1)%16 != C2)
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var ": " data_array[var] " discontinuities"
    else
      print "Channel " var ": " data_array[var] " discontinuity"
}'

This User Gave Thanks to |UVI| For This Post:
# 4  
Old 04-14-2011
Quote:
Originally Posted by mirni
How about this...
Code:
awk  '{
         ch!=$1{ch=$1;seq=$2}  #initialize with new channel
         ch==$1{   #channel same as stored
             if((seq++%16)!=$2) #increment and cycle the counter; compare
                 cntr[$1]++
}END{
   for(i in cntr) {
      print "Channel " i " has " cntr[i] " discontinuities" 
   }
 }' input

Note that the output is gonna be in random order.
To sort them by channel pipe this awk code to sort
Code:
awk '{...}' input | sort -n -k2

To sort by number of discontinuities, sort by fourth field:
Code:
| sort -n -k4



It doesn't seem to work, it doesn't print anything... are the first couple of instructions supposed to be wrapped in a BEGIN statement?


---------- Post updated at 11:42 AM ---------- Previous update was at 11:34 AM ----------

Quote:
Originally Posted by |UVI|
another way

Code:
awk 'BEGIN{

Quote:
Originally Posted by |UVI|
Code:
 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1
      C2 = $2

        if ( P1 == C1 ){
        if ( (P2+1)%16 != C2)
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{ 
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var ": " data_array[var] " discontinuities"
    else
      print "Channel " var ": " data_array[var] " discontinuity"
}'



Thank you!!! It works Smilie .....but there's a tiny problem. I wanted to specify that two consecutive fields having the same value shouldn't be seen as a discontinuity.

So for example

160 13
160 14
160 15
160 0
160 1
160 1
160 1
160 4 <-- **
160 2 <-- **
409 2
409 3
409 5 <-- **


Lines 5, 6, 7 shouldn't be seen as a discontinuity since I have a lot of those in the input file Smilie

---------- Post updated at 11:54 AM ---------- Previous update was at 11:42 AM ----------

Quote:
Originally Posted by |UVI|
another way

Code:
awk 'BEGIN{

Quote:
Originally Posted by |UVI|
Code:
 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1
      C2 = $2

        if ( P1 == C1 ){
        if ( (P2+1)%16 != C2)
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{ 
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var ": " data_array[var] " discontinuities"
    else
      print "Channel " var ": " data_array[var] " discontinuity"
}'




Actually, I think it's only counting the number of times a channel is present in the first field... because I have another script that does that and returns the stats, and they're both giving the same results now...
# 5  
Old 04-14-2011
Quote:
Originally Posted by acsg



Lines 5, 6, 7 shouldn't be seen as a discontinuity since I have a lot of those in the input file Smilie
Code:
cat input.txt | awk 'BEGIN{ 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1 
      C2 = $2

      if ( P1 == C1 && P2 != C2){
        if ( (P2+1)%16 != C2 )
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var " : "  data_array[var] " discontinuities"
    else
      print "Channel " var " : "  data_array[var] " discontinuity"
}'


now should be works

---------- Post updated at 04:20 AM ---------- Previous update was at 03:58 AM ----------

using this the program prints also channel with 0 discontinuities Smilie

Code:
cat input.txt | awk 'BEGIN{ 
  while (getline  > 0 && NF > 0){
    data_array[$1]+=0
    if ( NR > 1 ){
      C1 = $1 
      C2 = $2

      if ( P1 == C1 && P2 != C2){
        if ( (P2+1)%16 != C2 )
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] = 1)
      print "Channel " var " : "  data_array[var] " discontinuity"
    else
      print "Channel " var " : "  data_array[var] " discontinuities"
}'

This User Gave Thanks to |UVI| For This Post:
# 6  
Old 04-14-2011
Quote:
Originally Posted by |UVI|
Code:
cat input.txt | awk 'BEGIN{ 
  while (getline  > 0){
    if ( NR > 1 ){
      C1 = $1 
      C2 = $2

      if ( P1 == C1 && P2 != C2){
        if ( (P2+1)%16 != C2 )
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{
  for (var in data_array)
    if ( data_array[var] > 1)
      print "Channel " var " : "  data_array[var] " discontinuities"
    else
      print "Channel " var " : "  data_array[var] " discontinuity"
}'


now should be works

---------- Post updated at 04:20 AM ---------- Previous update was at 03:58 AM ----------

using this the program prints also channel with 0 discontinuities
Smilie

Code:
cat input.txt | awk 'BEGIN{

Code:
 
  while (getline  > 0 && NF > 0){
    data_array[$1]+=0
    if ( NR > 1 ){
      C1 = $1 
      C2 = $2

        if ( P1 == C1 && P2 != C2){
        if ( (P2+1)%16 != C2 )
          data_array[$1]++
      }
    }
    P1 = $1
    P2 = $2
  }
}

END{ 
  for (var in data_array)
    if ( data_array[var] = 1)
      print "Channel " var " : "  data_array[var] " discontinuity"
    else
      print "Channel " var " : "  data_array[var] " discontinuities"
}'


Smilie Thank you so much!! You were extremely helpful.
# 7  
Old 04-14-2011
Sorry, I had an extra brace there at the beginning... I fixed the original reply.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Append data with substring of nth column fields using awk

Hi guys, I have problem to append new data at the end of each line of the files where it takes whole value of the nth column. My expected result i just want to take a specific value only. This new data is based on substring of 11th, 12th 13th column that has comma seperated value. My code: awk... (4 Replies)
Discussion started by: null7
4 Replies

2. Shell Programming and Scripting

UNIX append field with comparing fields from multiple column

I have a csv dump from sql server that needs to be converted so it can be feed to another program. I already sorted on field 1 but there are multiple columns with same field 1 where it needs to be compared against and if it is same then append field 5. i.e from ANG SJ,0,B,LC22,LC22(0) BAT... (2 Replies)
Discussion started by: nike27
2 Replies

3. Shell Programming and Scripting

Finding out the common lines in two files using 4 fields with the help of awk and UNIX

Dear All, I have 2 files. If field 1, 2, 4 and 5 matches in both file1 and file2, I want to print the whole line of file1 and file2 one after another in my output file. File1: sc2/80 20 . A T 86 F=5;U=4 sc2/60 55 . G T ... (1 Reply)
Discussion started by: NamS
1 Replies

4. Shell Programming and Scripting

Join fields comparing 4 fields using awk

Hi All, I am looking for an awk script to do the following Join the fields together only if the first 4 fields are same. Can it be done with join function in awk?? a,b,c,d,8,,, a,b,c,d,,7,, a,b,c,d,,,9, a,b,p,e,8,,, a.b,p,e,,9,, a,b,p,z,,,,9 a,b,p,z,,8,, desired output: ... (1 Reply)
Discussion started by: aksijain
1 Replies

5. UNIX for Dummies Questions & Answers

Comparing multiple fields from 2 files uing awk

Hi I have 2 files as below File 1 Chr Start End chr1 120 130 chr1 140 150 chr2 130 140 File2 Chr Start End Value chr1 121 128 ABC chr1 144 149 XYZ chr2 120 129 PQR I would like to compare these files using awk; specifically if column 1 of file1 is equal to column 1 of file2... (7 Replies)
Discussion started by: sshetty
7 Replies

6. UNIX for Dummies Questions & Answers

Compare values of fields from same column with awk

Hi all ! If there is only one single value in a column (e.g. column 1 below), then return this value in the same output column. If there are several values in the same column (e.g. column 2 below), then return the different values separated by "," in the output. pipe-separated input: ... (11 Replies)
Discussion started by: lucasvs
11 Replies

7. Shell Programming and Scripting

Comparing two csv file fields using awk script

Hi All, I want to remove the rows from File1.csv by comparing the columns/fields in the File2.csv. I only need the records whose first column is same and the second column is different for the same record in both files.Here is an example on what I need. File1.csv: RAJAK|ACTIVE|1... (2 Replies)
Discussion started by: rajak.net
2 Replies

8. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

9. Shell Programming and Scripting

finding greatest value in a column using awk from iostat output in linux

Friends, . On linux i have to run iostat command and in each iteration have to print the greatest value in each column. e.g iostat -dt -kx 2 2 | awk ' !/sd/ &&!/%util/ && !/Time/ && !/Linux/ {print $12}' 4.38 0.00 0.00 0.00 What i would like to print is only the... (3 Replies)
Discussion started by: achak01
3 Replies

10. Shell Programming and Scripting

Finding the total of a column using awk

Here is my file name countries USSR 8650 262 Asia Canada 3852 24 North America China 3692 866 Asia USA 3615 219 North America Brazil 3286 116 South America India 1269 637 Asia Argentina 1072 ... (8 Replies)
Discussion started by: ironhead3fan
8 Replies
Login or Register to Ask a Question