match range of different numbers by AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting match range of different numbers by AWK
# 36  
Old 08-03-2009
Doubt

COuld you please explain this one

Code:
  NR == FNR && NF {
  NF > 2 && k = $1
  in2[k] = in2[k] ? in2[k] RS $1 FS $2 : $2 FS $3
  next
  }

# 37  
Old 08-03-2009
I was just looking at the last xls ...
Now, first of all, which ranges from input1 you want to compare with the ranges in input2 (column 1 or column 2)?
Isn't the range separator constant or variable (comma, dot or what)?

---------- Post updated at 10:52 AM ---------- Previous update was at 10:29 AM ----------

Quote:
Originally Posted by repinementer
COuld you please explain this one

Code:
  NR == FNR && NF {
  NF > 2 && k = $1
  in2[k] = in2[k] ? in2[k] RS $1 FS $2 : $2 FS $3
  next
  }

Code:
NR == FNR

While reading the first non-empty input file ...

Code:
&& NF

and the current record contains at least one field ...

Code:
NF > 2 && k = $1

if the number of fields is greater than 2 set the variable k to the value of the first field.

Code:
in2[k] = in2[k] ? in2[k] RS $1 FS $2 : $2 FS $3

Build an (associative) array in2 keyed by the current value of k, the values are all first and second fields matching the same key
(simulating a more complex data structure, using the current RS as a element separator).

Code:
next

You need the next statement here, because you don't want these actions to be performed on the next input file(s).
# 38  
Old 08-03-2009
Code:
Now, first of all, which ranges from input1 you want to compare with the ranges in input2 (column 1 or column 2)?

It has to be column2 or 3 because column 1 has keys like X1....X2...
Code:
Now, first of all, which ranges from input1 you want to compare with the ranges in input2 (column 2or column 3)?

Column2
Actually there has to be tab instead of comma in column2. By mistake I forgot to convert comma to tabs
If we do that it column 2 will become column2 and 3
Same thing with input2 commas has to be convert to tabs

Now coming to the main point I would like to compare col2 and col3 of input1 to col2 and col3 of input2

Input1 Has to be like this

Code:
X1	1	2	1	4	+	1,2	0,1
	2	4					
X1	120	130	120	140	+	10,4	0,16
	136	140					
X1	4	3	1	4	-	1,2	0,1
	2	1					
X1	140	130	120	140	-	10,4	0,16
	125	120					
X1	15	20	15	98	+	5,10,4	0,35,79
	50	60					
	94	98					
X1	98	96	15	98	-	5,10,4	0,35,79
	75	45					
	25	15					
X1	1	2	1	36	+	1,2,3	0,1,32
	2	4					
	33	36					
X1	88	84	84	140	-	4,10,4	0,36,52
	130	120					
	140	136					
X1	15	20	15	110	+	5,10,5	0,35,90
	50	60					
	105	110					
X1	98	94	50	98	-	10,4,4	0,34,44
	88	84					
	60	50					
X2	15	20	15	98	+	5,10,4	0,35,79
	50	60					
	94	98					
X2	98	96	15	98	-	5,10,4	0,35,79
	75	45					
	25	15					
X3	1	2	1	4	+	1,2	0,1
	2	4					
X3	120	130	120	140	+	10,4	0,16
	136	140					
X3	4	3	1	4	-	1,2	0,1
	2	1					
X3	140	130	120	140	-	10,4	0,16
	125	120					
X3	15	20	15	98	+	5,10,4	0,35,79
	50	60					
	94	98					
X3	98	96	15	98	-	5,10,4	0,35,79
	75	45					
	25	15					
X3	1	2	1	36	+	1,2,3	0,1,32
	2	4					
	33	36					
X3	88	84	84	140	-	4,10,4	0,36,52
	130	120					
	140	136

Input2 has to be like this

Code:
X1	5	10	5	118	+	5,10,10,18	0,25,75,95
	30	40					
	80	90					
	100	118					
X2	10	20	10	100	+	10,20,20	0,30,70
	40	60					
	80	100					
X3	118	100	5	118	-	5,10,10,18	0,25,75,95
	90	80					
	40	30					
	10	5					
X4	5	10	5	118	+	5,10,10,18	0,25,75,95
	30	40					
	80	90					
	100	118

I just converted commas into tabs. Every thins as same as the last XLS file

---------- Post updated at 01:05 AM ---------- Previous update was at 12:59 AM ----------

If you want to use these numbers copy and paste these in notepad. Dont paste in excel file.

Thanx
# 39  
Old 08-03-2009
I don't understand the range definitions in your example output in the last xls file ...
You defined the X1 -> 1,2 as GRANGE, why? Why not ARANGE?
X1 -> 15,20 is defined once as ERANGE, once as GRANGE, why?

---------- Post updated at 11:20 AM ---------- Previous update was at 11:14 AM ----------

I believe you should really try to do it yourself. I just don't have time to analyze these continuously changing specifics ...
Try to write the code yourself.
Feel free to ask when you have a problem with a specific micro-task, but you will learn only if you practice.
# 40  
Old 08-04-2009
Quote:
I don't understand the range definitions in your example output in the last xls file ...
You defined the X1 -> 1,2 as GRANGE, why? Why not ARANGE?
X1 -> 15,20 is defined once as ERANGE, once as GRANGE, why?
Because in ARANGE 1,2 and 2,4 of X1are out of area when they compared to the ranges
5-10,30-40,80-90 and 100-118.

Code:
 .....5-10,30-40,80-90 and 100-118    ARANGE
1-2,2-4

In second case GRANGE, 1-2, 2-4 and 33-36 of X1are not out of area when they compared to the ranges because of 33-36

Code:
..5-10,30-40,80-90 and 100-118    GRANGE
1-2,2-4.33-36

It means we have to consider every range . If any one of them is with in the range like 33-36 in 30-40 we have to give the name GRANGE or HRANGE based on upper or lower.

---------- Post updated at 01:30 AM ---------- Previous update was at 01:24 AM ----------

I agree with you rado.
I should do that. I will start with micro task as you suggested.
Just reply me if you find free time otherwise just ignore it.
I'm very happy that you helped me alot so far even though I didn't finish the script.

---------- Post updated 08-04-09 at 01:24 AM ---------- Previous update was 08-03-09 at 01:30 AM ----------

Code:
awk 'NF {
  sec = $2; fifth = split($5, _fifth, ","); sixth = split($6, _sixth, ",")
  counter = 0; key = $1; flag = $4; sub(/[^ \t*]*/, "")
  dummy = sprintf("%*s", length(key),x)
  for (i=1; i<=sixth; i++) {
    second_third = sec + _sixth[i] FS _fifth[i] + sec + _sixth[i]
    third_second = _fifth[i] + sec + _sixth[i] FS sec + _sixth[i] 
    if (flag == "+") 
      rec = rec ? rec RS dummy OFS second_third : key OFS second_third OFS $0
    else  
      rec_rev = rec_rev ? \
        (++counter == sixth - 1 ? key OFS third_second OFS $0 : dummy OFS third_second ) RS rec_rev : \
        dummy OFS third_second
    }
  print (flag == "+" ? rec : rec_rev)    
 }' OFS='\t' ORS='\n\n' r1.txt

input

Code:
X1    100    200    +    10,20,30,30    10,20,30,40

X2    100    200    +    10,20,30,30    10,20,30,40

output

Code:
X1    110 120        100    200    +    10,20,30,30    10,20,30,40
      120 140
      130 160
      140 170

X1    110 120        100    200    +    10,20,30,30    10,20,30,40
      120 140
      130 160
      140 170
      110 120
      120 140
      130 160
      140 170

CORRECT oUTPUT NEEDED IS

Code:
X1    110 120        100    200    +    10,20,30,30    10,20,30,40
      120 140
      130 160
      140 170
X2   110 120        100    200    +    10,20,30,30    10,20,30,40
      120 140
      130 160
      140 170

Why I'm getting extra values. COuld you please explain. X2 results are not coming in output.
# 41  
Old 08-04-2009
Because I forgot to reset/empty the rec variable Smilie.

Try this instead:

Code:
awk 'NF {
  sec = $2; fifth = split($5, _fifth, ","); sixth = split($6, _sixth, ",")
  counter = rec = ""; key = $1; flag = $4; sub(/[^ \t*]*/, "")
  dummy = sprintf("%*s", length(key),x)
  for (i=1; i<=sixth; i++) {
    second_third = sec + _sixth[i] FS _fifth[i] + sec + _sixth[i]
    third_second = _fifth[i] + sec + _sixth[i] FS sec + _sixth[i] 
    if (flag == "+") 
      rec = rec ? rec RS dummy OFS second_third : key OFS second_third OFS $0
    else  
      rec_rev = rec_rev ? \
        (++counter == sixth - 1 ? key OFS third_second OFS $0 : dummy OFS third_second ) RS rec_rev : \
        dummy OFS third_second
    }
  print (flag == "+" ? rec : rec_rev)    
 }' OFS='\t' ORS='\n\n' r1.txt

# 42  
Old 08-05-2009
Hi Rado

Ya. Now it's working great.
Yesterday and today I completely spend my time reading AWK book.
I covered up to control statements. Still your script is looking complex to me though I'm able to understand somepart.
But your are right in one thing I'm learning so much even though it is taking lot of time.

Thanx for the help and suggestionsSmilie

---------- Post updated at 09:23 PM ---------- Previous update was at 04:32 AM ----------


Input
Code:
X1	84	140	-	4,10,4	0,36,52
X1	20	110	+	5,10,5	0,35,90
X1	84	140	-	4,10,4	0,36,52

output
Code:
X1	136 140		84	140	-	4,10,4	0,36,52
  	120 130
  	84 88

X1	20 25		20	110	+	5,10,5	0,35,90
  	55 65
  	110 115

  	136 140
X1	120 130		84	140	-	4,10,4	0,36,52
  	84 88
X1	136 140		84	140	-	4,10,4	0,36,52
  	120 130
  	84 88

Needed and correct output

Code:
X1	136 140		84	140	-	4,10,4	0,36,52
  	120 130
  	84 88

X1	20 25		20	110	+	5,10,5	0,35,90
  	55 65
  	110 115
X1	136 140		84	140	-	4,10,4	0,36,52
  	120 130
  	84 88

Some where in the code is printing double values for "-".
The bold letters are creating mess. As you can see X1 is one row down as well as its unnecessary duplicate

---------- Post updated 08-05-09 at 01:21 AM ---------- Previous update was 08-04-09 at 09:23 PM ----------

I solved it. Please don't post any answer for this question

Code:
$ awk 'NF {
  sec = $2; fifth = split($5, _fifth, ","); sixth = split($6, _sixth, ",")
  counter = rec = ""; key = $1; flag = $4; sub(/[^ \t*]*/, "")
  dummy = sprintf("%*s", length(key),x)
  for (i=1; i<=sixth; i++) {
    second_third = sec + _sixth[i] FS _fifth[i] + sec + _sixth[i]
    third_second = sec + _sixth[i] FS _fifth[i] + sec + _sixth[i]
    if (flag == "+")
      rec = rec ? rec RS dummy OFS second_third : key OFS second_third OFS $0
    else if (flag == "-")
      rec = rec ? rec RS dummy OFS third_second : key OFS third_second OFS $0
  }
  print (flag == "+" ? rec : rec)
  }' OFS='\t' ORS='\n\n' input.txt


Last edited by repinementer; 08-05-2009 at 04:29 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to print text in field if match and range is met

In the awk below I am trying to match the value in $4 of file1 with the split value from $4 in file2. I store the value of $4 in file1 in A and the split value (using the _ for the split) in array. I then strore the value in $2 as min, the value in $3 as max, and the value in $1 as chr. If A is... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

Get range out using sed or awk, only if given pattern match

Input: START OS:: UNIX Release: xxx Version: xxx END START OS:: LINUX Release: xxx Version: xxx END START OS:: Windows Release: xxx Version: xxx ENDHere i am trying to get all the information between START and END, only if i could match OS Type. I can get all the data between the... (3 Replies)
Discussion started by: Dharmaraja
3 Replies

3. Shell Programming and Scripting

Match on a range of numbers

Hi, I'm trying to match a filename that could be called anything from vout001 to vout252 and was trying to do a small test but I'm not getting the result I thought I would.. Can some one tell me what I'm doing wrong? *****@********>echo $mynumber ... (4 Replies)
Discussion started by: Jazmania
4 Replies

4. Shell Programming and Scripting

awk : match only the pattern string , not letters or numbers after that.

Hi Experts, I am finding difficulty to get exact match: file OPERATING_SYSTEM=HP-UX LOOPBACK_ADDRESS=127.0.0.1 INTERFACE_NAME="lan3" IP_ADDRESS="10.53.52.241" SUBNET_MASK="255.255.255.192" BROADCAST_ADDRESS="" INTERFACE_STATE="" DHCP_ENABLE=0 INTERFACE_NAME="lan3:1"... (6 Replies)
Discussion started by: rveri
6 Replies

5. Shell Programming and Scripting

Complex match of numbers between 2 files awk script

Hello to all, I hope some awk guru could help me. I have 2 input files: File1: Is the complete database File2: Contains some numbers which I want to compare File1: "NUMBERKEY","SERVICENAME","PARAMETERNAME","PARAMETERVALUE","ALTERNATENUMBERKEY"... (9 Replies)
Discussion started by: Ophiuchus
9 Replies

6. Shell Programming and Scripting

Awk numeric range match only one digit?

Hello, I have a text file with lines that look like this: 1974 12 27 -0.72743 -1.0169 2 1.25029 1974 12 28 -0.4958 -0.72926 2 0.881839 1974 12 29 -0.26331 -0.53426 2 0.595623 1974 12 30 7.71432E-02 -0.71887 3 0.723001 1974 12 31 0.187789 -1.07114 3 1.08748 1975 1 1 0.349933 -1.02217... (2 Replies)
Discussion started by: meridionaljet
2 Replies

7. Shell Programming and Scripting

Range of numbers in HEX using AWK

Hi , How do i found out all the number in a range ( HEX) for example Input is 15CF:15D2 Output needed 15CF 15D0 15D1 15D2 Thanks (2 Replies)
Discussion started by: greycells
2 Replies

8. Shell Programming and Scripting

awk to match a numeric range specified by two columns

Hi Everyone, Here's a snippet of my data: File 1 = testRef2: A1BG - 13208 13284 AAA1 - 34758475 34873943 AAAS - 53701240 53715412File 2 = 42MLN.3.bedS2: 13208 13208 13360 13363 13484 13518 13518My awk script: awk 'NR == FNR{a=$1;next} {$1>=a}{$1<=a}{print... (5 Replies)
Discussion started by: heecha
5 Replies

9. Shell Programming and Scripting

Match real numbers in AWK

I am looking for a better way to match real numbers within a specified tolerance range. My current code is as follows: if ($1 !~ /^CASE/) for(i=1;i in G;i++) if (G >= $5-1 && G <= $5+1) { print $1,$4,$5,J,G } else { print $1,"NO MATCH" } where $5 and G are... (3 Replies)
Discussion started by: cold_Que
3 Replies

10. Shell Programming and Scripting

match numbers (awk)

i would like to enter (user input) a bunch of numbers seperated by space: 10 15 20 25 and use awk to print out any lines in a file that have matching numbers so output is: 22 44 66 55 (10) 77 (20) (numbers 10 and 20 matched for example) is this possible in awk . im using gawk for... (5 Replies)
Discussion started by: tanku
5 Replies
Login or Register to Ask a Question