Hi, I have the following problem that is beyond what I can currently do with bash scripting.
In file 1, I have ~ 2500000 values. Note this file is not sorted.
I have a list of values in "file 2" ~ 25000 unique lines:
Note - LABEL_7 AND LABEL_8 overlap slightly in their column 2 and 3 values
ETC
To run through what I would like to do, as an example:
LABEL_A (FILE 1) has a 3 in column 1, and a value of 19 in column 2.
I want to compare this to every line in FILE 2.
So, if there is a 3 in column 1 of FILE2, and 19 is between the values of columns 2 and 3 of FILE2, see what label this corresponds to in FILE2.
In this example, 19 is between the values in column 2 and 3 (FILE2) for LABEL_7 and LABEL_8.
Desired output: (Note the value of 2 in column 4 below means there are 2 labels that contain the value 19).
Full output:
I think the code for this will involve while loops and arrays, but I have no idea where to start. Any bash solutions would be great (as this is what I am currently learning), but any assistance at all would be very much appreciated.
You explicitly state that file 1 is not sorted, but you don't say whether or not file 2 is sorted. Is file 2 sorted by increasing numeric values in fields 1, 2, and 3 as shown in your example, or is file 2 also unsorted?
What is the range of the values in fields 1, 2, and 3?
Are all of the values in fields 1, 2, and 3 integers or are floating point values also included?
Field (column 1) goes up to 24 in both File 1 and File 2.
File 1 is unsorted.
File 2 is sorted, from 1-24 in column 1. It is then sorted by lowest number number in column 2 (real values are from 1-about 30,000,000).
So this means that if column 1 = 1, values could be from
or
There is no fixed number by which column 2 increases by, nor is there a fixed number by which column 3 increases by. The lowest possible number in column 2, and highest possible number in column 3 changes depending on whether there is 1,2,3,4 etc in column 1.
Field (column 1) goes up to 24 in both File 1 and File 2.
File 1 is unsorted.
File 2 is sorted, from 1-24 in column 1. It is then sorted by lowest number number in column 2 (real values are from 1-about 30,000,000).
So this means that if column 1 = 1, values could be from
or
There is no fixed number by which column 2 increases by, nor is there a fixed number by which column 3 increases by. The lowest possible number in column 2, and highest possible number in column 3 changes depending on whether there is 1,2,3,4 etc in column 1.
The number in all columns are fixed integers.
I hope that makes things clearer...Thanks!
I understand that file 1 is not sorted and that doesn't matter to me. Lines from file 1 can be read and processed without having to store file 1 values.
File 2 values, however, need to be stored so they can be examined 2.5 million times (once for each line in file 1), so I want to minimize the effort needed to determine if a line in file 1 matches values saved from file 2.
You said above that file 2 is sorted by numeric value on fields 1 and 2 with field 1 being the primary sort key. But, the value marked in red in your example above shows that file 2 is not sorted as you described. It is also strange that that entry has the high end of the range with a value lower than the low end of the range. Did you perhaps intend for the second set of values to be:
instead of:
or do we have to reverse the order of values in fields 2 and 3 if field 2's value is greater than field 3's value?
This User Gave Thanks to Don Cragun For This Post:
I don't have a set of files of the size you want to use, but the following works for the samples you provided:
As always, if you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk instead of awk.
With your sample input, this script produces:
which matches the Full output you said you wanted except your output showed a space instead of the underline marked in red in the last output line.
This User Gave Thanks to Don Cragun For This Post:
Don that works perfectly. Does exactly what I wanted. Also, thanks for so clearly annotating your lines, I can tell exactly what you've done and that's really helpful for a relative newbie like me. Thanks so much
Hi there,
A bit new to bash and am having an issue with a for loop. I look for filenames in a specified directory and pull the date string from each meeting a certain criteria, and then would like to make a directory for each date found, like this:
search 20180101.gz 20180102.gz 20180103.gz... (5 Replies)
Hello All,
Maybe I'm Missing something here but I have NOOO idea what the heck is going on with this....?
I have a Variable that contains a PATTERN of what I'm considering "Illegal Characters". So what I'm doing is looping
through a string containing some of these "Illegal Characters". Now... (5 Replies)
Below is a test script I was trying to use so that I could understand why the logic was not working in a larger script. While accessing and printing array data inside the while loop, everything is fine. Outside the loop, i guess everything is null?? The for loop that is meant to cycle... (4 Replies)
Hi All,
I am trying to run a do while for an array. And in the do while, I'm trying to get a user response. Depending on the the answer, I go ahead and do something or I move on to next element in the array.
So far I can read the array, but I can't get the if statement to work. Any suggestions... (5 Replies)
I've got this problem, if I modify an array in the loop and print it, everything is fine as long as I stay in the loop. But, when I print it outside the loop, nothing happens... How can I solve this problem?
Here I prepared a sample for you to see my problem;
zgrw@Rain:~$ cat test
asd
123... (4 Replies)
So I'm trying to read datafile into an array, with each line representing one variable in the array. I'm successful at first but somehow it keeps getting erased.
i=0
grep '.*' datafile | while read line
do
echo $i
array=$(echo $line)
echo ${array} #printing array to make sure it's... (5 Replies)
Hi everyone:)
I have 2 files - IN & OUT. Example:
IN
A:13:30
B:45:40
.
.
. UNLIMITED
OUT
Z:12:24
Y:20:15
.
.
. UNLIMITED
I want first row of numbers of IN - OUT. Example 13-12 45-20
My code is (2 Replies)
Hello,
I have a problem with my script whereby it does not want to loop.
The function of my script is to FTP into a server and go to each directory/volume in the array 'VOL'. The way the loop is suppose to work, is to go into the first volume, get the files of R(yesterday's date) and... (3 Replies)
I am just stucked in syntax.This is more like a array and for loop problem.
I want to use ls -l command and get filezise and filename of all filenames in the directory in an array (say array#1).
After 2 minutes of sleep, i want to get the same information in another array (say array#2).
The... (4 Replies)