match range of different numbers by AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting match range of different numbers by AWK
# 43  
Old 08-11-2009
.
.
.
.
.
# 44  
Old 08-13-2009
hey

Hey rado there is only one change left. Could you help me with this.
input1
Code:
A1    1 2    +
A1      2 4    +
A2    120 130    +
A2      136 140    +
A3    2 1    -
A3      4 2    -
A4    130 120    -
A4      140 136    -
A5    15 20    +
A5      50 60    +
A5      94 98    +
A6    20 15    -
A6      60 50    -
A6      98 94    -
A7    1 2    +
A7      2 4    +
A7      33 36    +
A8    88 84    -
A8      130 120    -
A8      140 136    -
A9    1 2    +
A9      2 4    +
A10    120 130    +
A10       136 140    +
A11    2 1    -
A11       4 2    -
A12    130 120    -
A12       140 136    -
A13    15 20    +
A13       50 60    +
A13       94 98    +
A14    20 15    -
A14       60 50    -
A14       98 94    -
A15    1 2    +
A15       2 4    +
A15       33 36    +
A16    88 84    -
A16       130 120    -
A16       140 136    -
A17    88 84    -
A17       130 120    -
A17       140 136    -

input2
Code:
A1    5 10    +
      30 40
      80 90
      100 108
A2    5 10    +
      30 40
      80 90
      100 108
A3    5 10    +
      30 40
      80 90
      100 108
A4    5 10    +
      30 40
      80 90
      100 108
A5    5 10    +
      30 40
      80 90
      100 108
A6    5 10    +
      30 40
      80 90
      100 108
A7    5 10    +
      30 40
      80 90
      100 108
A8    5 10    +
      30 40
      80 90
      100 108
A9    5 10    -
      30 40
      80 90
      100 108
A10    5 10    -
       30 40
       80 90
       100 108
A11    5 10    -
       30 40
       80 90
       100 108
A12    5 10    -
       30 40
       80 90
       100 108
A13    5 10    -
       30 40
       80 90
       100 108
A14    5 10    -
       30 40
       80 90
       100 108
A15    5 10    -
       30 40
       80 90
       100 108
A16    5 10    -
       30 40
       80 90
       100 108

output
Code:
A1    1 2    +    ARANGE
A1      2 4    +    ARANGE
A2    120 130    +    BRANGE
A2      136 140    +    BRANGE
A3    2 1    -    CRANGE
A3      4 2    -    CRANGE
A4    130 120    -    DRANGE
A4      140 136    -    DRANGE
A5    15 20    +    ERANGE
A5      50 60    +    ERANGE
A5      94 98    +    ERANGE
A6    20 15    -    FRANGE
A6      60 50    -    FRANGE
A6      98 94    -    FRANGE
A7    1 2    +    ARANGE
A7      2 4    +    ARANGE
A7      33 36    +    GRANGE
A8    88 84    -    HRANGE
A8      130 120    -    DRANGE
A8      140 136    -    DRANGE
A9    1 2    +    ARANGE
A9      2 4    +    ARANGE
A10    120 130    +    BRANGE
A10       136 140    +    BRANGE
A11    2 1    -    CRANGE
A11       4 2    -    CRANGE
A12    130 120    -    DRANGE
A12       140 136    -    DRANGE
A13    15 20    +    ERANGE
A13       50 60    +    ERANGE
A13       94 98    +    ERANGE
A14    20 15    -    FRANGE
A14       60 50    -    FRANGE
A14       98 94    -    FRANGE
A15    1 2    +    ARANGE
A15       2 4    +    ARANGE
A15       33 36    +    GRANGE
A16    88 84    -    HRANGE
A16       130 120    -    DRANGE
A16       140 136    -    DRANGE
A17    88 84    -    UNKNOWN
A17       130 120    -    UNKNOWN
A17       140 136    -    UNKNOWN

needed output

Code:
A1    1 2    +    ARANGE
A1      2 4    +    ARANGE
A2    120 130    +    BRANGE
A2      136 140    +    BRANGE
A3    2 1    -    CRANGE
A3      4 2    -    CRANGE
A4    130 120    -    DRANGE
A4      140 136    -    DRANGE
A5    15 20    +    ERANGE
A5      50 60    +    ERANGE
A5      94 98    +    ERANGE
A6    20 15    -    FRANGE
A6      60 50    -    FRANGE
A6      98 94    -    FRANGE
A7    1 2    +    ARANGE
A7      2 4    +    ARANGE
A7      33 36    +    GRANGE
A8    88 84    -    HRANGE
A8      130 120    -    DRANGE
A8      140 136    -    DRANGE
A9    1 2    +    BRANGE
A9      2 4    +    BRANGE
A10    120 130    +    ARANGE
A10       136 140    +    ARANGE
A11    2 1    -    DRANGE
A11       4 2    -    DRANGE
A12    130 120    -    CRANGE
A12       140 136    -    CRANGE
A13    15 20    +    FRANGE
A13       50 60    +    FRANGE
A13       94 98    +    FRANGE
A14    20 15    -    ERANGE
A14       60 50    -    ERANGE
A14       98 94    -    ERANGE
A15    1 2    +    BRANGE
A15       2 4    +    BRANGE
A15       33 36    +    HRANGE
A16    88 84    -    GRANGE
A16       130 120    -    ERANGE
A16       140 136    -    ERANGE
A17    88 84    -    UNKNOWN
A17       130 120    -    UNKNOWN
A17       140 136    -    UNKNOWN

The modification here if the input 2 has - symbol in the 4th column the the result changes from A RANGE -> BRANGE and B -> A, C -> D, D-> C, E ->F, F-> E, G -> H and H -> GRANGES in the output lik as I showed in needed output.

script
Code:
#!/usr/bin/awk -f

BEGIN {
OFS="\t"; ORS="\n" 
  def["ascoutlower"]    = "ARANGE"   
  def["ascoutupper"]    = "BRANGE"
  def["descoutlower"]   = "CRANGE"
  def["descoutupper"]   = "DRANGE"
  def["ascinnotexact"]  = "ERANGE"
  def["descinnotexact"] = "FRANGE"
  def["ascinexact"]     = "GRANGE"
  def["descinexact"]    = "HRANGE"
  }

NR == FNR && NF {
  NF > 2 && k = $1
  in2[k] = in2[k] ? in2[k] RS $1 FS $2 : $2 FS $3
  next
  }
$1 in in2 {
  n = split(in2[$1], tmp, RS) 
  split(tmp[1], Tmp); min = Tmp[1]
  m = split(tmp[n], Tmp); max = Tmp[m]
  # asc - desc
  Def = $2 > $3 ? "desc" : "asc"
  # inrange - outofrange
  if (Def == "asc")
    Def = Def ($2 >= min && $3 <= max ? "in" : "out") 
  else
    Def = Def ($3 >= min && $2 <= max ? "in" : "out")
  # lower - upper
  if ((Def ~ /ascout/ ? $3 : $2) <= min) {
    Def = Def "lower"
    print $0 "\t" def[Def]
    next
    }
  if ((Def ~ /ascout/ ? $3 : $2) >= max) {
    Def = Def "upper"
    print $0 "\t" def[Def]
    next
    }    
  # exact - not exact
  for (i=1; i<=n; i++) {
    split(tmp[i], range)
    if (Def ~ /asc/) { k1 = $2; k2 = $3 }      
    else { k1 = $3; k2 = $2 }
    if (k1 >= range[1] && k2 <= range[2]) {
      Def = Def "exact"
      print $0 "\t" def[Def]
      next
      }
    }
      Def = Def "notexact"
    print $0 "\t" def[Def]
    next    
}!/^[ \t]/ { print $0 "\tUNKNOWN" }

Thanx
Please getback to me if you find any difficulty reading it

---------- Post updated 08-13-09 at 01:05 AM ---------- Previous update was 08-12-09 at 10:39 PM ----------

i ROUGHLY WROTE A CODE
IF THE COLUMN 4 IN THE OUTPUT2 HAS + THE THE RESULT IS RED BOLD LETTERS OR ELSE -, GREEN BOLD LETTERS
THE PROBLEM I HAVE IS HOW TO DEFINE THE THE SYMBOLS?

Code:
#!/usr/bin/awk -f

BEGIN {
OFS="\t"; ORS="\n" 
  if (flag ~ /+/)
  def["ascoutlower"]    = "ARANGE"   
  def["ascoutupper"]    = "BRANGE"
  def["descoutlower"]   = "CRANGE"
  def["descoutupper"]   = "DRANGE"
  def["ascinnotexact"]  = "ERANGE"
  def["descinnotexact"] = "FRANGE"
  def["ascinexact"]     = "GRANGE"
  def["descinexact"]    = "HRANGE"
  else
  def["ascoutlower"]    = "BRANGE"   
  def["ascoutupper"]    = "ARANGE"
  def["descoutlower"]   = "DRANGE"
  def["descoutupper"]   = "CRANGE"
  def["ascinnotexact"]  = "FRANGE"
  def["descinnotexact"] = "ERANGE"
  def["ascinexact"]     = "HRANGE"
  def["descinexact"]    = "GRANGE"
  }

NR == FNR && NF {
  NF > 2 && k = $1
  in2[k] = in2[k] ? in2[k] RS $1 FS $2 : $2 FS $3
  next
  }
$1 in in2 {
  n = split(in2[$1], tmp, RS) 
  split(tmp[1], Tmp); min = Tmp[1]
  m = split(tmp[n], Tmp); max = Tmp[m]
  # asc - desc
  Def = $2 > $3 ? "desc" : "asc"
  # inrange - outofrange
  if (Def == "asc")
    Def = Def ($2 >= min && $3 <= max ? "in" : "out") 
  else
    Def = Def ($3 >= min && $2 <= max ? "in" : "out")
  # lower - upper
  if ((Def ~ /ascout/ ? $3 : $2) <= min) {
    Def = Def "lower"
    print $0 "\t" def[Def]
    next
    }
  if ((Def ~ /ascout/ ? $3 : $2) >= max) {
    Def = Def "upper"
    print $0 "\t" def[Def]
    next
    }    
  # exact - not exact
  for (i=1; i<=n; i++) {
    split(tmp[i], range)
    if (Def ~ /asc/) { k1 = $2; k2 = $3 }      
    else { k1 = $3; k2 = $2 }
    if (k1 >= range[1] && k2 <= range[2]) {
      Def = Def "exact"
      print $0 "\t" def[Def]
      next
      }
    }
      Def = Def "notexact"
    print $0 "\t" def[Def]
    next    
}!/^[ \t]/ { print $0 "\tUNKNOWN" }


Last edited by repinementer; 08-13-2009 at 04:00 AM..
# 45  
Old 08-27-2009
You cannot use flag for this purpose in the BEGIN block (it's evaluated only once, before it's initialized).
Use the first version of the associative array and just swap outupper with outlower and asciin with descin at run time (if + ..., else ...).
# 46  
Old 08-31-2009
I dont understand. Anyways Thanx for the advice
# 47  
Old 09-01-2009
Leave the associative array in the BEGIN block like this:

Code:
BEGIN {
  def["ascoutlower"]    = "ARANGE"   
  def["ascoutupper"]    = "BRANGE"
  def["descoutlower"]   = "CRANGE"
  def["descoutupper"]   = "DRANGE"
  def["ascinnotexact"]  = "ERANGE"
  def["descinnotexact"] = "FRANGE"
  def["ascinexact"]     = "GRANGE"
  def["descinexact"]    = "HRANGE"
  ...
}

Then,
before printing the record, insert this code:

Code:
  if (/ - /) {
    sub(/outupper/, "outlower", Def)
    sub(/asciin/, "descin", Def)
  }

# 48  
Old 09-02-2009
Can I do some thing like this. ???But I modifed according to my requirements.
Code:
#!/usr/bin/awk -f

BEGIN {
OFS="\t"; ORS="\n" 
  def["ascoutlower"]    = "ARANGE"   
  def["ascoutupper"]    = "BRANGE"
  def["descoutlower"]   = "CRANGE"
  def["descoutupper"]   = "DRANGE"
  def["ascinnotexact"]  = "ERANGE"
  def["descinnotexact"] = "FRANGE"
  def["ascinexact"]     = "GRANGE"
  def["descinexact"]    = "HRANGE"
  
  }
  
NR == FNR && NF {
  NF > 2 && k = $1
  in2[k] = in2[k] ? in2[k] RS $1 FS $2 : $2 FS $3
  next
  }
$1 in in2 {
  n = split(in2[$1], tmp, RS) 
  split(tmp[1], Tmp); min = Tmp[1]
  m = split(tmp[n], Tmp); max = Tmp[m]
  # asc - desc
  Def = $2 > $3 ? "desc" : "asc"
  # inrange - outofrange
  if (Def == "asc")
    Def = Def ($2 >= min && $3 <= max ? "in" : "out") 
  else
    Def = Def ($3 >= min && $2 <= max ? "in" : "out")
  # lower - upper
  if ((Def ~ /ascout/ ? $3 : $2) <= min) {
    Def = Def "lower"
    print $0 "\t" def[Def]
    next
    }
  if ((Def ~ /ascout/ ? $3 : $2) >= max) {
    Def = Def "upper"
    print $0 "\t" def[Def]
    next
    }    
  # exact - not exact
  for (i=1; i<=n; i++) {
    split(tmp[i], range)
    if (Def ~ /asc/) { k1 = $2; k2 = $3 }      
    else { k1 = $3; k2 = $2 }
    if (k1 >= range[1] && k2 <= range[2]) {
      Def = Def "exact"
      print $0 "\t" def[Def]
      next
      }
    }
      Def = Def "notexact"
    print $0 "\t" def[Def]
    next
    #Seperte plus and minus
    if (/ + /) {
	    sub(/ascoutlower/, "ascoutupper", Def)
     	sub(/ascoutupper/, "descoutupper", Def)
     	sub(/descoutupper/, "descoutlower", Def)
     	sub(/descoutlower/, "descoutupper", Def)
     	sub(/ascinnotexac/, "descinnotexac", Def)
     	sub(/descinnotexac/, "ascinnotexac", Def)
     	sub(/ascinexact/, "descinexact", Def)
     	sub(/descinexact/, "ascinexact", Def)
 	
	elseif(/ - /) 
	    sub(/ascoutlower/, "descoutupper", Def)
     	sub(/descoutupper/, "ascoutlower", Def)
     	sub(/ascoutupper/, "descoutlower", Def)
     	sub(/descoutlower/, "ascoutupper", Def)
     	sub(/ascinnotexact/, "descinnotexact", Def)
     	sub(/descinnotexact/, "ascinnotexact", Def)
     	sub(/ascinexact/, "descinexact", Def)
     	sub(/descinexact/, "ascinexact", Def)
  }  
 }!/^[ \t]/ { print $0 "\t","UNKNOWN" }



---------- Post updated 09-02-09 at 12:34 AM ---------- Previous update was 09-01-09 at 06:04 AM ----------

Hey Sorry for bothering.
I wrote another shell script to do the above job. Forget the above question.
I found a bug in your script
This bus is not producing any ranges for few lines!Do you know why??
For example


input1
Code:
A	239861347 239858777	-
B	233849110 233849388	+
C	202864284 202864396	+
D	187984662 187982263	-

input2
Code:
A	239858789 239865855	-
B	233849110 233849388	+
C	202864284 202864396	+
D	187984054 187984122	+
 	187984914 187984960
 	187985046 187985179
 	187985444 187985584
 	187986365 187986534
 	187986646 187986756
 	187986984 187987128
 	187987609 187987747
 	187987977 187988067
 	187988285 187988365
 	187989607 187990379



---------- Post updated at 12:36 AM ---------- Previous update was at 12:34 AM ----------

Just incase
The script I/ using
Code:
awk -f script.awk input2 input1

Code:
#!/usr/bin/awk -f

BEGIN {
OFS="\t"; ORS="\n" 
  def["ascoutlower"]    = "ARANGE"   
  def["ascoutupper"]    = "BRANGE"
  def["descoutlower"]   = "CRANGE"
  def["descoutupper"]   = "DRANGE"
  def["ascinnotexact"]  = "ERANGE"
  def["descinnotexact"] = "FRANGE"
  def["ascinexact"]     = "GRANGE"
  def["descinexact"]    = "HRANGE"
  
  }

NR == FNR && NF {
  NF > 2 && k = $1
  in2[k] = in2[k] ? in2[k] RS $1 FS $2 : $2 FS $3
  next
  }
$1 in in2 {
  n = split(in2[$1], tmp, RS) 
  split(tmp[1], Tmp); min = Tmp[1]
  m = split(tmp[n], Tmp); max = Tmp[m]
  # asc - desc
  Def = $2 > $3 ? "desc" : "asc"
  # inrange - outofrange
  if (Def == "asc")
    Def = Def ($2 >= min && $3 <= max ? "in" : "out") 
  else
    Def = Def ($3 >= min && $2 <= max ? "in" : "out")
  # lower - upper
  if ((Def ~ /ascout/ ? $3 : $2) <= min) {
    Def = Def "lower"
    print $0 "\t" def[Def]
    next
    }
  if ((Def ~ /ascout/ ? $3 : $2) >= max) {
    Def = Def "upper"
    print $0 "\t" def[Def]
    next
    }    
  # exact - not exact
  for (i=1; i<=n; i++) {
    split(tmp[i], range)
    if (Def ~ /asc/) { k1 = $2; k2 = $3 }      
    else { k1 = $3; k2 = $2 }
    if (k1 >= range[1] && k2 <= range[2]) {
      Def = Def "exact"
      print $0 "\t" def[Def]
      next
      }
    }
      Def = Def "notexact"
    print $0 "\t" def[Def]
    next    
}!/^[ \t]/ { print $0 "\tUNKNOWN" }


Last edited by repinementer; 09-01-2009 at 11:19 AM..
# 49  
Old 09-04-2009
Yes,
because the definitions are missing ....
Just try to print the keys:

Code:
zsh-4.3.10[t]% cat s
#!/usr/bin/awk -f

BEGIN {
OFS="\t"; ORS="\n" 
  def["ascoutlower"]    = "ARANGE"   
  def["ascoutupper"]    = "BRANGE"
  def["descoutlower"]   = "CRANGE"
  def["descoutupper"]   = "DRANGE"
  def["ascinnotexact"]  = "ERANGE"
  def["descinnotexact"] = "FRANGE"
  def["ascinexact"]     = "GRANGE"
  def["descinexact"]    = "HRANGE"
  
  }

NR == FNR && NF {
  NF > 2 && k = $1
  in2[k] = in2[k] ? in2[k] RS $1 FS $2 : $2 FS $3
  next
  }
$1 in in2 {
  n = split(in2[$1], tmp, RS) 
  split(tmp[1], Tmp); min = Tmp[1]
  m = split(tmp[n], Tmp); max = Tmp[m]
  # asc - desc
  Def = $2 > $3 ? "desc" : "asc"
  # inrange - outofrange
  if (Def == "asc")
    Def = Def ($2 >= min && $3 <= max ? "in" : "out") 
  else
    Def = Def ($3 >= min && $2 <= max ? "in" : "out")
  # lower - upper
  if ((Def ~ /ascout/ ? $3 : $2) <= min) {
    Def = Def "lower"
    print $0 "\t" def[Def], Def
    next
    }
  if ((Def ~ /ascout/ ? $3 : $2) >= max) {
    Def = Def "upper"
    print $0 "\t" def[Def] "\t" Def
    next
    }    
  # exact - not exact
  for (i=1; i<=n; i++) {
    split(tmp[i], range)
    if (Def ~ /asc/) { k1 = $2; k2 = $3 }      
    else { k1 = $3; k2 = $2 }
    if (k1 >= range[1] && k2 <= range[2]) {
      Def = Def "exact"
      print $0 "\t" def[Def] "\t" Def
      next
      }
    }
      Def = Def "notexact"
    print $0 "\t" def[Def] "\t" Def
    next    
}!/^[ \t]/ { print $0 "\tUNKNOWN" }
zsh-4.3.10[t]% ./s input2 input1
A       239861347 239858777     -               descoutnotexact
B       233849110 233849388     +               ascinlower
C       202864284 202864396     +               ascinlower
D       187984662 187982263     -               descoutnotexact

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to print text in field if match and range is met

In the awk below I am trying to match the value in $4 of file1 with the split value from $4 in file2. I store the value of $4 in file1 in A and the split value (using the _ for the split) in array. I then strore the value in $2 as min, the value in $3 as max, and the value in $1 as chr. If A is... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

Get range out using sed or awk, only if given pattern match

Input: START OS:: UNIX Release: xxx Version: xxx END START OS:: LINUX Release: xxx Version: xxx END START OS:: Windows Release: xxx Version: xxx ENDHere i am trying to get all the information between START and END, only if i could match OS Type. I can get all the data between the... (3 Replies)
Discussion started by: Dharmaraja
3 Replies

3. Shell Programming and Scripting

Match on a range of numbers

Hi, I'm trying to match a filename that could be called anything from vout001 to vout252 and was trying to do a small test but I'm not getting the result I thought I would.. Can some one tell me what I'm doing wrong? *****@********>echo $mynumber ... (4 Replies)
Discussion started by: Jazmania
4 Replies

4. Shell Programming and Scripting

awk : match only the pattern string , not letters or numbers after that.

Hi Experts, I am finding difficulty to get exact match: file OPERATING_SYSTEM=HP-UX LOOPBACK_ADDRESS=127.0.0.1 INTERFACE_NAME="lan3" IP_ADDRESS="10.53.52.241" SUBNET_MASK="255.255.255.192" BROADCAST_ADDRESS="" INTERFACE_STATE="" DHCP_ENABLE=0 INTERFACE_NAME="lan3:1"... (6 Replies)
Discussion started by: rveri
6 Replies

5. Shell Programming and Scripting

Complex match of numbers between 2 files awk script

Hello to all, I hope some awk guru could help me. I have 2 input files: File1: Is the complete database File2: Contains some numbers which I want to compare File1: "NUMBERKEY","SERVICENAME","PARAMETERNAME","PARAMETERVALUE","ALTERNATENUMBERKEY"... (9 Replies)
Discussion started by: Ophiuchus
9 Replies

6. Shell Programming and Scripting

Awk numeric range match only one digit?

Hello, I have a text file with lines that look like this: 1974 12 27 -0.72743 -1.0169 2 1.25029 1974 12 28 -0.4958 -0.72926 2 0.881839 1974 12 29 -0.26331 -0.53426 2 0.595623 1974 12 30 7.71432E-02 -0.71887 3 0.723001 1974 12 31 0.187789 -1.07114 3 1.08748 1975 1 1 0.349933 -1.02217... (2 Replies)
Discussion started by: meridionaljet
2 Replies

7. Shell Programming and Scripting

Range of numbers in HEX using AWK

Hi , How do i found out all the number in a range ( HEX) for example Input is 15CF:15D2 Output needed 15CF 15D0 15D1 15D2 Thanks (2 Replies)
Discussion started by: greycells
2 Replies

8. Shell Programming and Scripting

awk to match a numeric range specified by two columns

Hi Everyone, Here's a snippet of my data: File 1 = testRef2: A1BG - 13208 13284 AAA1 - 34758475 34873943 AAAS - 53701240 53715412File 2 = 42MLN.3.bedS2: 13208 13208 13360 13363 13484 13518 13518My awk script: awk 'NR == FNR{a=$1;next} {$1>=a}{$1<=a}{print... (5 Replies)
Discussion started by: heecha
5 Replies

9. Shell Programming and Scripting

Match real numbers in AWK

I am looking for a better way to match real numbers within a specified tolerance range. My current code is as follows: if ($1 !~ /^CASE/) for(i=1;i in G;i++) if (G >= $5-1 && G <= $5+1) { print $1,$4,$5,J,G } else { print $1,"NO MATCH" } where $5 and G are... (3 Replies)
Discussion started by: cold_Que
3 Replies

10. Shell Programming and Scripting

match numbers (awk)

i would like to enter (user input) a bunch of numbers seperated by space: 10 15 20 25 and use awk to print out any lines in a file that have matching numbers so output is: 22 44 66 55 (10) 77 (20) (numbers 10 and 20 matched for example) is this possible in awk . im using gawk for... (5 Replies)
Discussion started by: tanku
5 Replies
Login or Register to Ask a Question