match range of different numbers by AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting match range of different numbers by AWK
# 1  
Old 07-21-2009
match range of different numbers by AWK

if the column1 and 2 in both files has same key (for example "a" and "a1") compare each first key value(a1 of a) of input2 (for example 1-4 or 65-69 not 70-100 or 44-40 etc) with all the values in input1.
if the range of first key value in input2 is outof range in input1 values named it as out of range1 or vice versa.
some of the key values in input2 are in descending order and some are in ascending order. based on these order we have to name them accordingly as I shown below.

If it seems complicated and time consuming please give me any basic idea or approach to compare ranges of 2 files

Help would be appreciated

Code:
input1

a	a1	5-10
		30-40
		45-60
		80-90
		100-120

input2

a	a1	1-4
a	a1	4-1
a	a1	120-140
a	a1	140-120
a	a1	65-69
		70-100
a	a1	70-65
		44-40
a	a1	30-33
		37-57
		63-83
a	a1	85-81
		30-25
b	b1	100-200
c	c2	1-200
d	d3	2-333

output

a	a1	1-4		outofrange1
a	a1	4-1		outofrange2
a	a1	120-140     outofrange3
a	a1	140-12      outofrange4
a	a1	65-69	        inrange1
		70-100
a	a1	70-65  	inrange2
		44-40	
a	a1	30-33 	inrange3
		37-57
		63-83 
a	a1	85-81 	inrange4
		30-25

[COLOR="#738fbf"]

---------- Post updated at 01:34 AM ---------- Previous update was at 12:21 AM ----------

---------- Post updated at 01:35 AM ---------- Previous update was at 01:34 AM ----------

If it seems complicated and time consuming please give me any basic idea or approach to compare ranges of 2 files

Last edited by repinementer; 07-21-2009 at 05:38 AM..
# 2  
Old 07-22-2009
Quote:
[...]
some of the key values in input2 are in descending order and some are in ascending order. based on these order we have to name them accordingly as I shown below
Could you please elaborate further?
# 3  
Old 07-22-2009
Elaborative

There are 2 inputfiles as I mentioned, Input1 and Input2.
Input1 has 3 columns. 1st one has keyvalues and 2nd ones has sub key values and 3rd one has numerical values (ranges like from 5-10, 30-40)
Code:
input1

a	a1	5-10
		30-40
		45-60
		80-90
		100-120
x       a2    10-20
                50-60

Input2 has also 3 columns exactly like input1.1st one with key and 2nd one with subkey and 3rd one with various ranges of numbers like 1-4,1-4,120-140,140-120
Code:
input2

a	a1	1-4
a	a1	4-1
a	a1	120-140
a	a1	140-120
a	a1	65-69
		70-100
a	a1	70-65
		44-40
a	a1	6-7
		37-57
		63-83
a	a1	7-8

Now I need to name the input2 value ranges according the ranges given in input 1 like in the following output
Code:
output

a	a1	1-4		outofrange1
a	a1	4-1		outofrange2
a	a1	120-140     outofrange3
a	a1	140-12      outofrange4
a	a1	65-69	        inrange1
		70-100
a	a1	70-65  	inrange2
		44-40	
a	a1	6-7            inrange3
a	a1	7-8            inrange4

As you can see 1-4 (the values from 1 to 4 in input2 are absent in input1) giving the name as outofrange1.

2nd one is little bit trciky if the value range is from high to low like 4-1 (the values from 4 to 1 in input2 are absent in input1). though it looks same as 1st case it has high value to low value range like 4 to 1. and given name as outofrange2

outofrange3 is same as outofrange1
outofrange4 is same as outofrange2

5th one, 65-69, inrange1 (the values from 65 to 69 in input2 are present in between the ranges in input1 but not with in the exact ranges giving the name as inrange1.

45-60 65-69 80-90

inrange2.[/B] is the trickiest version of inrange1.[/B](I mentioned before in ouofrange2)

7th one , 6-7, inrange3 (the values 6 to 7 in input 2 are exactly present in between the values of input 1), inrange3.

5 6-7 10

inrange4.[/B] is the trickiest version of inrange2.[/B]

And most importantly we are comparing the first range values of all the keys in input2 with all the key values in input1 like a-a1-65-69 with all the values in input2 a -a1- 5-10, 30-40, 45-60,80-90, 100-120



Hope it elucidate every thing clearlySmilie

Last edited by stateperl; 07-23-2009 at 12:02 AM..
# 4  
Old 07-23-2009
With unclear requirement and mistakes in the expected out it was difficult to write a program. There was no program was also provided so i had to write from scratch.

Try:
Code:
re_arrange_file ()
{
infile=$1
out_file=$infile"x"
>$out_file
while read line
do
        set $line
        if [ $# -eq 3 ]; then
                key=$1
                subkey=$2
                min_range=$(echo $3 | cut -d"-" -f1)
                max_range=$(echo $3 | cut -d"-" -f2)
        else
                min_range=$(echo $1 | cut -d"-" -f1)
                max_range=$(echo $1 | cut -d"-" -f2)
        fi
        if [[ $min_range -gt $max_range ]]; then
           (( min_range = $max_range + $min_range))
                (( max_range = $min_range - $max_range))
                (( min_range = $min_range - $max_range))
        fi

        echo $key $subkey $min_range $max_range >> $out_file
done < $infile
}

re_arrange_file input1
re_arrange_file input2
>out_file
in_range_count=0
out_range_count=0
file2_lin_no=0
while read line
do
   set $line
   key=$1
   subkey=$2
   min_range=$3
   max_range=$4

        found=0
        ((file2_lin_no = $file2_lin_no + 1))
        file2_lin=`head -$file2_lin_no input2 | tail -1`
        cat input1x | grep "$key" | grep "$subkey" > tmp
        while read _key _subkey _min _max
        do
                if [[ ${_min} -le $min_range && ${_max} -ge $max_range ]]; then
                        ((in_range_count = $in_range_count + 1))
                        echo $file2_lin "inrange"$in_range_count
                        found=1
                        break
                fi
        done < tmp
        if [[ $found -eq 0 ]]; then
                ((out_range_count = $out_range_count + 1))
                echo $file2_lin outofrange"$out_range_count"
        fi
done < input2x



---------- Post updated at 11:32 AM ---------- Previous update was at 11:30 AM ----------

Output:
Code:
a a1 1-4 outofrange1
a a1 4-1 outofrange2
a a1 120-140 outofrange3
a a1 140-120 outofrange4
a a1 65-69 outofrange5
70-100 outofrange6
a a1 70-65 outofrange7
44-40 outofrange8
a a1 6-7 inrange1
37-57 outofrange9
63-83 outofrange10
a a1 7-8 inrange2

This is not exactly what is provided in question. But I think in question it is wrong, as many lines do not have either inrange or outofrange itself.
a a1 65-69 should be outofrange but was provided otherwise.
# 5  
Old 07-23-2009
hey my apologies for inconvenience.
Thank you very much for the script and time you have spent on this problem

I assume still your code is missing the following


Quote:
a a1 65-69 outofrange5
70-100 outofrange6
a a1 70-65 outofrange7
44-40 outofrange8


Need to take first range (others are not needed)

a a1 65-69 outofrange5 (need to compare)
70-100 outofrange6 (no need to compare)
a a1 70-65 outofrange7 (need to compare)
44-40 outofrange8 (no need to compare)



Quote:
This is not exactly what is provided in question. But I think in question it is wrong, as many lines do not have either inrange or outofrange itself.
a a1 65-69 should be outofrange but was provided otherwise.


65-69 is out of range but they are in between the ranges i have given in input1


This is really really important

Need to consider the ranges between the ranges even though they are not exactly match. Especially in case of 65-69. This range is present in between 45-60 and 80-90
45-60 65-69 80-90

Same thing follows to 70-65 case

Last edited by repinementer; 07-23-2009 at 03:31 AM..
# 6  
Old 07-23-2009
I really don't understand ...
Could you post a bigger samples from the input files and the expected output? Are the in/outofrange n always progressing or they are specific to the combination?
You could start with something like this (use gawk, nawk or /usr/xpg4/bin/awk on Solaris):

Code:
awk 'NR == FNR {
  NF != 1 && k = $1
  in1[k] = in1[k] ? in1[k] FS $NF : $NF
  next
  }
$1 in in1 {
  n = split(in1[$1], t, "-")
  min = t[1]; max = t[n]; split($NF, tt, "-")
  tt[1] > tt[2] ? k1 = 2 && k2 = 1 : k1 = 1 && k2 = 2 
  range = tt[k1] >= min && tt[k2] <= max ? "inrange" : "outofrange"
  $0 = $0 "\t\t" range (++r[range]) 
    }1' input*

This is what I get:

Code:
zsh-4.3.10[t]% head -20 in*
==> input1 <==
a       a1      5-10
                30-40
                45-60
                80-90
                100-120
x       a2    10-20
                50-60

==> input2 <==
a       a1      1-4
a       a1      4-1
a       a1      120-140
a       a1      140-120
a       a1      65-69
                70-100
a       a1      70-65
                44-40
a       a1      6-7
                37-57
                63-83
a       a1      7-8

zsh-4.3.10[t]% awk 'NR == FNR {
  NF != 1 && k = $1
  in1[k] = in1[k] ? in1[k] FS $NF : $NF
  next
  }
$1 in in1 {
  n = split(in1[$1], t, "-")
  min = t[1]; max = t[n]; split($NF, tt, "-")
  tt[1] > tt[2] ? k1 = 2 && k2 = 1 : k1 = 1 && k2 = 2
  range = tt[k1] >= min && tt[k2] <= max ? "inrange" : "outofrange"
  $0 = $0 "\t\t" range (++r[range])
    }1' input*
a       a1      1-4             outofrange1
a       a1      4-1             outofrange2
a       a1      120-140         outofrange3
a       a1      140-120         outofrange4
a       a1      65-69           inrange1
                70-100
a       a1      70-65           inrange2
                44-40
a       a1      6-7             inrange3
                37-57
                63-83
a       a1      7-8             inrange4

# 7  
Old 07-23-2009
Quote:
Are the in/outofrange n always progressing or they are specific to the combination?
That is the exact output I'm looking for but the names are specific not progressive

except this error every thing seems to be right.

I will post sample input files asap . Thanx alot for advice and script
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to print text in field if match and range is met

In the awk below I am trying to match the value in $4 of file1 with the split value from $4 in file2. I store the value of $4 in file1 in A and the split value (using the _ for the split) in array. I then strore the value in $2 as min, the value in $3 as max, and the value in $1 as chr. If A is... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

Get range out using sed or awk, only if given pattern match

Input: START OS:: UNIX Release: xxx Version: xxx END START OS:: LINUX Release: xxx Version: xxx END START OS:: Windows Release: xxx Version: xxx ENDHere i am trying to get all the information between START and END, only if i could match OS Type. I can get all the data between the... (3 Replies)
Discussion started by: Dharmaraja
3 Replies

3. Shell Programming and Scripting

Match on a range of numbers

Hi, I'm trying to match a filename that could be called anything from vout001 to vout252 and was trying to do a small test but I'm not getting the result I thought I would.. Can some one tell me what I'm doing wrong? *****@********>echo $mynumber ... (4 Replies)
Discussion started by: Jazmania
4 Replies

4. Shell Programming and Scripting

awk : match only the pattern string , not letters or numbers after that.

Hi Experts, I am finding difficulty to get exact match: file OPERATING_SYSTEM=HP-UX LOOPBACK_ADDRESS=127.0.0.1 INTERFACE_NAME="lan3" IP_ADDRESS="10.53.52.241" SUBNET_MASK="255.255.255.192" BROADCAST_ADDRESS="" INTERFACE_STATE="" DHCP_ENABLE=0 INTERFACE_NAME="lan3:1"... (6 Replies)
Discussion started by: rveri
6 Replies

5. Shell Programming and Scripting

Complex match of numbers between 2 files awk script

Hello to all, I hope some awk guru could help me. I have 2 input files: File1: Is the complete database File2: Contains some numbers which I want to compare File1: "NUMBERKEY","SERVICENAME","PARAMETERNAME","PARAMETERVALUE","ALTERNATENUMBERKEY"... (9 Replies)
Discussion started by: Ophiuchus
9 Replies

6. Shell Programming and Scripting

Awk numeric range match only one digit?

Hello, I have a text file with lines that look like this: 1974 12 27 -0.72743 -1.0169 2 1.25029 1974 12 28 -0.4958 -0.72926 2 0.881839 1974 12 29 -0.26331 -0.53426 2 0.595623 1974 12 30 7.71432E-02 -0.71887 3 0.723001 1974 12 31 0.187789 -1.07114 3 1.08748 1975 1 1 0.349933 -1.02217... (2 Replies)
Discussion started by: meridionaljet
2 Replies

7. Shell Programming and Scripting

Range of numbers in HEX using AWK

Hi , How do i found out all the number in a range ( HEX) for example Input is 15CF:15D2 Output needed 15CF 15D0 15D1 15D2 Thanks (2 Replies)
Discussion started by: greycells
2 Replies

8. Shell Programming and Scripting

awk to match a numeric range specified by two columns

Hi Everyone, Here's a snippet of my data: File 1 = testRef2: A1BG - 13208 13284 AAA1 - 34758475 34873943 AAAS - 53701240 53715412File 2 = 42MLN.3.bedS2: 13208 13208 13360 13363 13484 13518 13518My awk script: awk 'NR == FNR{a=$1;next} {$1>=a}{$1<=a}{print... (5 Replies)
Discussion started by: heecha
5 Replies

9. Shell Programming and Scripting

Match real numbers in AWK

I am looking for a better way to match real numbers within a specified tolerance range. My current code is as follows: if ($1 !~ /^CASE/) for(i=1;i in G;i++) if (G >= $5-1 && G <= $5+1) { print $1,$4,$5,J,G } else { print $1,"NO MATCH" } where $5 and G are... (3 Replies)
Discussion started by: cold_Que
3 Replies

10. Shell Programming and Scripting

match numbers (awk)

i would like to enter (user input) a bunch of numbers seperated by space: 10 15 20 25 and use awk to print out any lines in a file that have matching numbers so output is: 22 44 66 55 (10) 77 (20) (numbers 10 and 20 matched for example) is this possible in awk . im using gawk for... (5 Replies)
Discussion started by: tanku
5 Replies
Login or Register to Ask a Question