I think the code for this will involve while loops and arrays, but I have no idea where to start. Any bash solutions would be great (as this is what I am currently learning), but any assistance at all would be very much appreciated.
You explicitly state that file 1 is not sorted, but you don't say whether or not file 2 is sorted. Is file 2 sorted by increasing numeric values in fields 1, 2, and 3 as shown in your example, or is file 2 also unsorted?
What is the range of the values in fields 1, 2, and 3?
Are all of the values in fields 1, 2, and 3 integers or are floating point values also included?
Field (column 1) goes up to 24 in both File 1 and File 2.
File 1 is unsorted.
File 2 is sorted, from 1-24 in column 1. It is then sorted by lowest number number in column 2 (real values are from 1-about 30,000,000).
So this means that if column 1 = 1, values could be from
Code:
1 1 100
1 400 2050
1 9000 19200
or
Code:
2 1234 9999
2 25000 10000
2 14000 192000
There is no fixed number by which column 2 increases by, nor is there a fixed number by which column 3 increases by. The lowest possible number in column 2, and highest possible number in column 3 changes depending on whether there is 1,2,3,4 etc in column 1.
Field (column 1) goes up to 24 in both File 1 and File 2.
File 1 is unsorted.
File 2 is sorted, from 1-24 in column 1. It is then sorted by lowest number number in column 2 (real values are from 1-about 30,000,000).
So this means that if column 1 = 1, values could be from
Code:
1 1 100
1 400 2050
1 9000 19200
or
Code:
2 1234 9999
2 25000 10000
2 14000 192000
There is no fixed number by which column 2 increases by, nor is there a fixed number by which column 3 increases by. The lowest possible number in column 2, and highest possible number in column 3 changes depending on whether there is 1,2,3,4 etc in column 1.
The number in all columns are fixed integers.
I hope that makes things clearer...Thanks!
I understand that file 1 is not sorted and that doesn't matter to me. Lines from file 1 can be read and processed without having to store file 1 values.
File 2 values, however, need to be stored so they can be examined 2.5 million times (once for each line in file 1), so I want to minimize the effort needed to determine if a line in file 1 matches values saved from file 2.
You said above that file 2 is sorted by numeric value on fields 1 and 2 with field 1 being the primary sort key. But, the value marked in red in your example above shows that file 2 is not sorted as you described. It is also strange that that entry has the high end of the range with a value lower than the low end of the range. Did you perhaps intend for the second set of values to be:
Code:
2 1234 9999
2 2500 10000
2 14000 192000
instead of:
Code:
2 1234 9999
2 25000 10000
2 14000 192000
or do we have to reverse the order of values in fields 2 and 3 if field 2's value is greater than field 3's value?
This User Gave Thanks to Don Cragun For This Post:
I don't have a set of files of the size you want to use, but the following works for the samples you provided:
Code:
awk '
FNR == NR {
h[$1] = NR # set "h"igh end of input lines for this label
m[NR] = $2 # save "m"inimum range value for this line
M[NR] = $3 # save "M"aximum range value for this line
l[NR] = $4 # save "l"abel from this line
next
}
{ e = h[$1] # set high line number for field 1 value on this line
o = $NF # set initial output line to label on this line
oc = 0 # set number of matched lines
# Loop through the ranges associated with field 1 from this input line
for(i = h[$1 - 1] + 1; i <= e; i++) {
if($2 < m[i]) continue # range is too low; keep looking
if(m[i] > $2) break # range is too high; we are done
if($2 <= M[i]) { # we have a matching range
o = o " " l[i] # add corresponding label to output line
oc++ # increment match count
}
}
print o, oc # print the matched labels and the match count
}' "file 2" "file 1"
As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.
With your sample input, this script produces:
Don that works perfectly. Does exactly what I wanted. Also, thanks for so clearly annotating your lines, I can tell exactly what you've done and that's really helpful for a relative newbie like me. Thanks so much
Hi there,
A bit new to bash and am having an issue with a for loop. I look for filenames in a specified directory and pull the date string from each meeting a certain criteria, and then would like to make a directory for each date found, like this:
search 20180101.gz 20180102.gz 20180103.gz... (5 Replies)
Hello All,
Maybe I'm Missing something here but I have NOOO idea what the heck is going on with this....?
I have a Variable that contains a PATTERN of what I'm considering "Illegal Characters". So what I'm doing is looping
through a string containing some of these "Illegal Characters". Now... (5 Replies)
Below is a test script I was trying to use so that I could understand why the logic was not working in a larger script. While accessing and printing array data inside the while loop, everything is fine. Outside the loop, i guess everything is null?? The for loop that is meant to cycle... (4 Replies)
Hi All,
I am trying to run a do while for an array. And in the do while, I'm trying to get a user response. Depending on the the answer, I go ahead and do something or I move on to next element in the array.
So far I can read the array, but I can't get the if statement to work. Any suggestions... (5 Replies)
I've got this problem, if I modify an array in the loop and print it, everything is fine as long as I stay in the loop. But, when I print it outside the loop, nothing happens... How can I solve this problem?
Here I prepared a sample for you to see my problem;
zgrw@Rain:~$ cat test
asd
123... (4 Replies)
So I'm trying to read datafile into an array, with each line representing one variable in the array. I'm successful at first but somehow it keeps getting erased.
i=0
grep '.*' datafile | while read line
do
echo $i
array=$(echo $line)
echo ${array} #printing array to make sure it's... (5 Replies)
Hi everyone:)
I have 2 files - IN & OUT. Example:
IN
A:13:30
B:45:40
.
.
. UNLIMITED
OUT
Z:12:24
Y:20:15
.
.
. UNLIMITED
I want first row of numbers of IN - OUT. Example 13-12 45-20
My code is (2 Replies)
Hello,
I have a problem with my script whereby it does not want to loop.
The function of my script is to FTP into a server and go to each directory/volume in the array 'VOL'. The way the loop is suppose to work, is to go into the first volume, get the files of R(yesterday's date) and... (3 Replies)
I am just stucked in syntax.This is more like a array and for loop problem.
I want to use ls -l command and get filezise and filename of all filenames in the directory in an array (say array#1).
After 2 minutes of sleep, i want to get the same information in another array (say array#2).
The... (4 Replies)