Yes, this will always be the case if $4 is found as in f1
f1 will always be tab-delimited except for a whitespace after $3 and $4, but the output would be tab-delimited I did and OFS="\t" but I think the whitespaces are making that not work
You are correct in that I meant to be looking for inclusive endpoints so the >=/<= is what I should have used.
I used the || statement to make sure the script works as expected but it could be && as both coordinates should lie within the endpoints (trying to think of a situation where its not the case and not coming up with anything).
Thank you very much .
Thanks for the responses.
Unfortunately, upon looking closer at your example input files, there are no entries in f1 where both endpoints of any line in f2 fall within the ranges specified in f1. In the last line of f2$2 falls inside the range specified in the first line in f1 but $3 does not.
And, despite what you said about the input files being tab delimited, the samples you provided don't contain any <tab> characters. And, since you said that the real files you're using do contain some <space>s between field 3 and 4 and after field4, the following code assumes that strings of one or more <space> and/or <tab> characters separate field and that any <space> and <tab> characters after field 4 are to be ignored. (As written, the code shown below will not work if a line in either input file contains any leading <space> or <tab> characters.)
So, given the above and assuming that you just want there to be some overlap between the ranges specified in a line in f1 and in a line in f2, maybe the following will do what you want:
This uses <tab> as the output field separator, but on output lines that end in "intron", <space>s in the input will not be converted to <tab>s in the output.
If you run the above script with no operands (or with the operand f1 or with the operand f) from your sample input files, it will produce the output:
Note that the above code does not set SUBSEP since it was not used in your script and isn't needed in the code shown above. Note also that the field separator I'm using the code above uses any sequence of one or more <space>s and/or <tab>s to treated as a field separator and uses every <underscore> as a field separator. That means that the subfields you were splitting into the array named array in your code will all be treated as separate fields in the code above. (That means I don't have to call split() to break that string into subfields.)
The sample files you provided to test any of the "corner" cases where I might have missed something. I think it will work OK, but I haven't performed enough extensive testing to give you any kind of guarantee.
Hope this helps,
Don
This User Gave Thanks to Don Cragun For This Post:
hello experts,
I have a file: File1 Sample Test1
This is a Test
Sample Test2
Another Test
Final Test3
A final Test
I can use sed to delete the line with specific text
ie: sed '/Test2/d' File1.txt > File2.txt
How can I delete the line with the matching text and the line immediately... (6 Replies)
Hi,
I wish to use a column, as inputted by a user from command line, for pattern matching.
awk file:
{
if($1 ~ /^8/)
{
print $0> "temp2.csv"
}
}
something like this, but i want '$1' to be any column as selected by the user from command line.
... (1 Reply)
Dear All,
I would like to add values of a field, if the lines match in a certain field. Then I would like to divide the sum though the number of lines that have a matched field. This is the Input:
Input:
Test1 5
Test1 10
Test2 2
Test2 5
Test2 13
Test3 4
Output:
Test1 7.5
Test1 7.5... (6 Replies)
Sample file:
This is line one,
this is another line,
this is the PRIMARY INDEX line
l ;
This is another line
The command should find the line with “PRIMARY INDEX” and remove the last character from the line preceding it (in this case , comma) and remove the first character from the line... (5 Replies)
Hi,
I want to achieve something similar to what described in another post:
The difference is I want to add the line if the pattern is not found.
File 1:
A123, valueA, valueB
B234, valueA, valueB
C345, valueA, valueB
D456, valueA, valueB
E567, valueA, valueB
F678, valueA, valueB
... (11 Replies)
Hi there,
I'm trying to use awk to print out the entire line that contains a match to a certain regex and then append some text,plus the match to the end of the line.
So far I have:
awk -F: '{print "RG:Z:" $2}' file
Which prints out the match I want plus the additional text, but I'm stuck... (3 Replies)
Hello Help,
2356798 7689867 999 000
123678 20385907 9797 666
17978975 87468976 968978 98798
I am trying to have out put which actually look for the third column value of 9797 and then it insert line there after with first, second column value exactly as the previous line and replace the third... (3 Replies)
The bash bash below extracts the oldest folder from a directory and stores it in filename
That result will match a line in bold in input. In the matching line there is an_xxx digit in italics that
(once the leading zero is removed) will match a line in link. That is the lint to print in output.... (2 Replies)
In the awk I am trying to add :p.=? to the end of each $9 that matches the pattern NM_. The below executes andis close but I can not seem to figure out why the :p.=? repeats in the split as in the green in the current output. I have added comments as well. Thank you :).
file
... (4 Replies)
In the awk below I am trying to cp and paste each matching line in f2 to $3 in f1 if $2 of f1 is in the line in f2 somewhere. There will always be a match (usually more then 1) and my actual data is much larger (several hundreds of lines) in both f1 and f2. When the line in f2 is pasted to $3 in... (4 Replies)