Loop and array problem


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Loop and array problem
# 1  
Old 03-24-2013
Loop and array problem

Hi, I have the following problem that is beyond what I can currently do with bash scripting.

In file 1, I have ~ 2500000 values. Note this file is not sorted.

Code:
3 19 LABEL_A
3 37 LABEL_B
2 12 LABEL_C
1 15 LABEL_D

I have a list of values in "file 2" ~ 25000 unique lines:
Note - LABEL_7 AND LABEL_8 overlap slightly in their column 2 and 3 values
Code:
1 11 20 LABEL_1
1 18 30 LABEL_2
1 31 40 LABEL_3
2 11 20 LABEL_4
2 21 30 LABEL_5
2 31 40 LABEL_6
3 11 20 LABEL_7
3 15 30 LABEL_8
3 31 40 LABEL_9
4 11 20 LABEL_10

ETC

To run through what I would like to do, as an example:

LABEL_A (FILE 1) has a 3 in column 1, and a value of 19 in column 2.
I want to compare this to every line in FILE 2.

So, if there is a 3 in column 1 of FILE2, and 19 is between the values of columns 2 and 3 of FILE2, see what label this corresponds to in FILE2.

In this example, 19 is between the values in column 2 and 3 (FILE2) for LABEL_7 and LABEL_8.

Desired output: (Note the value of 2 in column 4 below means there are 2 labels that contain the value 19).
Code:
LABEL_A LABEL_7 LABEL_8 2

Full output:

Code:
LABEL_A LABEL_7 LABEL_8 2
LABEL_B LABEL_9 1
LABEL_C LABEL_4 1
LABEL_D LABEL 1 1

I think the code for this will involve while loops and arrays, but I have no idea where to start. Any bash solutions would be great (as this is what I am currently learning), but any assistance at all would be very much appreciated.
# 2  
Old 03-24-2013
You explicitly state that file 1 is not sorted, but you don't say whether or not file 2 is sorted. Is file 2 sorted by increasing numeric values in fields 1, 2, and 3 as shown in your example, or is file 2 also unsorted?

What is the range of the values in fields 1, 2, and 3?

Are all of the values in fields 1, 2, and 3 integers or are floating point values also included?
# 3  
Old 03-24-2013
Hi Don,

Sorry if I wasn't clear.

Field (column 1) goes up to 24 in both File 1 and File 2.
File 1 is unsorted.

File 2 is sorted, from 1-24 in column 1. It is then sorted by lowest number number in column 2 (real values are from 1-about 30,000,000).
So this means that if column 1 = 1, values could be from

Code:
1 1 100
1 400 2050
1 9000 19200

or

Code:
2 1234 9999
2 25000 10000
2 14000 192000

There is no fixed number by which column 2 increases by, nor is there a fixed number by which column 3 increases by. The lowest possible number in column 2, and highest possible number in column 3 changes depending on whether there is 1,2,3,4 etc in column 1.

The number in all columns are fixed integers.

I hope that makes things clearer...Thanks!
# 4  
Old 03-24-2013
Quote:
Originally Posted by hubleo
Hi Don,

Sorry if I wasn't clear.

Field (column 1) goes up to 24 in both File 1 and File 2.
File 1 is unsorted.

File 2 is sorted, from 1-24 in column 1. It is then sorted by lowest number number in column 2 (real values are from 1-about 30,000,000).
So this means that if column 1 = 1, values could be from

Code:
1 1 100
1 400 2050
1 9000 19200

or

Code:
2 1234 9999
2 25000 10000
2 14000 192000

There is no fixed number by which column 2 increases by, nor is there a fixed number by which column 3 increases by. The lowest possible number in column 2, and highest possible number in column 3 changes depending on whether there is 1,2,3,4 etc in column 1.

The number in all columns are fixed integers.

I hope that makes things clearer...Thanks!
I understand that file 1 is not sorted and that doesn't matter to me. Lines from file 1 can be read and processed without having to store file 1 values.

File 2 values, however, need to be stored so they can be examined 2.5 million times (once for each line in file 1), so I want to minimize the effort needed to determine if a line in file 1 matches values saved from file 2.

You said above that file 2 is sorted by numeric value on fields 1 and 2 with field 1 being the primary sort key. But, the value marked in red in your example above shows that file 2 is not sorted as you described. It is also strange that that entry has the high end of the range with a value lower than the low end of the range. Did you perhaps intend for the second set of values to be:
Code:
2 1234 9999
2 2500 10000
2 14000 192000

instead of:
Code:
2 1234 9999
2 25000 10000
2 14000 192000

or do we have to reverse the order of values in fields 2 and 3 if field 2's value is greater than field 3's value?
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 03-24-2013
Sorry that was a typo from me

Code:
2 1234 9999
2 2500 10000
2 14000 192000

Every value in column 2 is lower than column 3. Many thanks Smilie
# 6  
Old 03-24-2013
I don't have a set of files of the size you want to use, but the following works for the samples you provided:
Code:
awk '
FNR == NR {
        h[$1] = NR      # set "h"igh end of input lines for this label
        m[NR] = $2      # save "m"inimum range value for this line
        M[NR] = $3      # save "M"aximum range value for this line
        l[NR] = $4      # save "l"abel from this line
        next
}
{       e = h[$1]       # set high line number for field 1 value on this line
        o = $NF         # set initial output line to label on this line
        oc = 0          # set number of matched lines
        # Loop through the ranges associated with field 1 from this input line
        for(i = h[$1 - 1] + 1; i <= e; i++) {
                if($2 < m[i]) continue  # range is too low; keep looking
                if(m[i] > $2) break     # range is too high; we are done
                if($2 <= M[i]) {        # we have a matching range
                        o = o " " l[i]  # add corresponding label to output line
                        oc++            # increment match count
                }
        }
        print o, oc     # print the matched labels and the match count
}' "file 2" "file 1"

As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.
With your sample input, this script produces:
Code:
LABEL_A LABEL_7 LABEL_8 2
LABEL_B LABEL_9 1
LABEL_C LABEL_4 1
LABEL_D LABEL_1 1

which matches the Full output you said you wanted except your output showed a space instead of the underline marked in red in the last output line.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 03-25-2013
Don that works perfectly. Does exactly what I wanted. Also, thanks for so clearly annotating your lines, I can tell exactly what you've done and that's really helpful for a relative newbie like me. Thanks so much Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash for loop array

Hi there, A bit new to bash and am having an issue with a for loop. I look for filenames in a specified directory and pull the date string from each meeting a certain criteria, and then would like to make a directory for each date found, like this: search 20180101.gz 20180102.gz 20180103.gz... (5 Replies)
Discussion started by: mwheeler12
5 Replies

2. Shell Programming and Scripting

awk loop using array:wish to store array values from loop for use outside loop

Here's my code: awk -F '' 'NR==FNR { if (/time/ && $5>10) A=$2" "$3":"$4":"($5-01) else if (/time/ && $5<01) A=$2" "$3":"$4-01":"(59-$5) else if (/time/ && $5<=10) A=$2" "$3":"$4":0"($5-01) else if (/close/) { B=0 n1=n2; ... (2 Replies)
Discussion started by: klane
2 Replies

3. Shell Programming and Scripting

Array Variable being Assigned Values in Loop, But Gone when Loop Completes???

Hello All, Maybe I'm Missing something here but I have NOOO idea what the heck is going on with this....? I have a Variable that contains a PATTERN of what I'm considering "Illegal Characters". So what I'm doing is looping through a string containing some of these "Illegal Characters". Now... (5 Replies)
Discussion started by: mrm5102
5 Replies

4. Shell Programming and Scripting

problem access array outside of loop in bash

Below is a test script I was trying to use so that I could understand why the logic was not working in a larger script. While accessing and printing array data inside the while loop, everything is fine. Outside the loop, i guess everything is null?? The for loop that is meant to cycle... (4 Replies)
Discussion started by: adlmostwanted
4 Replies

5. Shell Programming and Scripting

Array with do while and if loop

Hi All, I am trying to run a do while for an array. And in the do while, I'm trying to get a user response. Depending on the the answer, I go ahead and do something or I move on to next element in the array. So far I can read the array, but I can't get the if statement to work. Any suggestions... (5 Replies)
Discussion started by: nitin
5 Replies

6. Shell Programming and Scripting

Array and Loop Problem

I've got this problem, if I modify an array in the loop and print it, everything is fine as long as I stay in the loop. But, when I print it outside the loop, nothing happens... How can I solve this problem? Here I prepared a sample for you to see my problem; zgrw@Rain:~$ cat test asd 123... (4 Replies)
Discussion started by: zgrw
4 Replies

7. Shell Programming and Scripting

Array not surviving while loop

So I'm trying to read datafile into an array, with each line representing one variable in the array. I'm successful at first but somehow it keeps getting erased. i=0 grep '.*' datafile | while read line do echo $i array=$(echo $line) echo ${array} #printing array to make sure it's... (5 Replies)
Discussion started by: DrSammyD
5 Replies

8. Shell Programming and Scripting

Help with awk in array in while loop

Hi everyone:) I have 2 files - IN & OUT. Example: IN A:13:30 B:45:40 . . . UNLIMITED OUT Z:12:24 Y:20:15 . . . UNLIMITED I want first row of numbers of IN - OUT. Example 13-12 45-20 My code is (2 Replies)
Discussion started by: vincyoxy
2 Replies

9. Shell Programming and Scripting

FTP and run a loop for array problem

Hello, I have a problem with my script whereby it does not want to loop. The function of my script is to FTP into a server and go to each directory/volume in the array 'VOL'. The way the loop is suppose to work, is to go into the first volume, get the files of R(yesterday's date) and... (3 Replies)
Discussion started by: tuffgong2008
3 Replies

10. Shell Programming and Scripting

Array Declaration and For Loop

I am just stucked in syntax.This is more like a array and for loop problem. I want to use ls -l command and get filezise and filename of all filenames in the directory in an array (say array#1). After 2 minutes of sleep, i want to get the same information in another array (say array#2). The... (4 Replies)
Discussion started by: 33junaid
4 Replies
Login or Register to Ask a Question