there will be no overlap in $2-$3 value ranges for any two lines in file2
there could potentially be overlap in the $2-$3 value ranges, that is why $4 or the gene id id used because the same $2-$3 values can not exist in two different genes. To be extra sure the combination of $4 and $1 can be used to ensure this, that will look at only the gene in $4 on the chromosome in $1. That might be better as it will be a unique lookup key used in the search.
Quote:
all of the lines in file2 that are associated with a $4 value in file1 are adjacent
yes, after the search key or lookup value in found in file2 all its associated lines will be adjacent, one on top of the other...
Quote:
the strings in $4 in file1 and at the start of $4 in file2 are irrelevant to this problem (only the ranges specified by $2-$3 matter other than copying the $4 value in file1
into the output)
yes, this is true... though the combination of $1 and $4 in file1 may be better to ensure a unique match and values are found faster.
Quote:
if a $2 value in file1 is inside one of the $2-$3 ranges in file2, then a new 5th field added to file1 should be set to exon in the output (this comes from the examples, but conflicts with several statements in the English requirements)
if a $2 value in file1 is not inside any $2-$3 range in file2 and the difference $2 on some line in file2 minus $2 on a line in file1 is greater than zero and less than eleven, then a 5th field added to file1 should be set to splicing in the output (this also comes from the examples, but conflicts with the stated English requirements), and
otherwise, a 5th field added to file1 should be set to intron.
yes this is correct, the conflicts in the english requirements have to do with the nature of the human genome and that it is ever-changing and still full of unknowns. The test being performed or utilized also factors in to it and can add additional complexity/conflicts.
Thank you very much for all of your help .
awk
Last edited by cmccabe; 11-11-2018 at 11:11 AM..
Reason: added awk
I have a line like this:
I want to move HTTP/1.1 200 OK to the next line and put a blank line between the two lines i.e.
How can i get it using awk?
Thanks in advance (2 Replies)
Hi All,
I was wondering if anyone knew how to dynamically change the FS in awk to accept vairiable containing a field separator. the current code is as below and does not work when i introduce the dynamic FS change :-(
validate_source_file()
{
source_file=$1
... (2 Replies)
Hi, all
I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes.
sample input:
for this line, 5 fields are supposed to be extracted, they... (8 Replies)
First, thanks for the help in previous posts... couldn't have gotten where I am now without it!
So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following:
If $1... (4 Replies)
Hi. I'd appreciate if I can get some direction in this issue to get me going.
Datafile1:
-About 4000 records, I have to update field#4 in selected records based on a match in the key field (Field#1).
-Field #1 is the key field (servername) . # of Fields may vary
# comment
server1 bbb ccc... (2 Replies)
Hi !
input:
111|222|333|aaa|bbb|ccc
999|888|777|nnn|kkk
444|666|555|eee|ttt|ooo|ppp
With awk, I am trying to change the FS "|" to "; " only from the 4th field until the end (the number of fields vary between records).
In order to get:
111|222|333|aaa; bbb; ccc
999|888|777|nnn; kkk... (1 Reply)
Hi Experts,
i need to change delimiter from tab to ","
sample test file
cat test
A0000368 A29938511 072569352 5 Any 2 for £1.00 BUTCHERS|CAT FOOD|400G Sep 12 2012 12:00AM Jan 5 2014 11:59PM Sep 7 2012 12:00AM M 2.000 group 5
... (2 Replies)
In the below awk in the first step I default Classification NF-1 to VUS. Next, I am trying to change the value of Classification (NF) to whatever CLINSIG (NF-1) is. If there is only one condition everything works great, but if there are two conditions it does not work. Is the syntax used... (4 Replies)
In the awk below I am trying to copy the entire contents of $6 there may be multiple values seperated by a ;, to $8, if $8 is . (lines 1 and 3 are examples). If that condition $8 is not . (line2 is an example) then that line is skipped and printed as is. The awk does execute but prints the output... (3 Replies)