awk to adjust coordinates in field based on sequential numbers in another field
I am trying to output a tab-delimited result that uses the data from a tab-delimited file to combine and subtract specific lines.
If $4 matches in each line then the first matching sequential $6 value is added to $2, unless the value is 1, then the original $2 is used (like in the case of line 1). This is the new or adjusted $2 value.
The last matching sequential $6 value is added to $2 and this is the new or adjusted $3 value.
The new $2 and $3 vales are combined with $1 in the format $1:$2-$3 and the $5 value is printed on the line.
The awk command below works great as long as the $4 values are unique, but that is not always the case. I can not seem to add in a condition that checks $6 and if the numbers are not sequential (1 2 is, but then there is a break between 92 93 94), when there is a break a new line is created.
Maybe there is another way but hopefully this helps. Thank you
desired output
awk
current output
Last edited by cmccabe; 01-27-2017 at 12:58 PM..
Reason: fixed format
If I run your awk on your input then I do not get your output.
But maybe I have understood your description.
If your file is sorted by $4 and $6 (so $6 sequences are in adjacent lines),
then the following can do it:
A problem is the "late" end-of-sequence detection. This is solved with storing the previous values, and an END section, and a print function.
This User Gave Thanks to MadeInGermany For This Post:
Hi,
So awk is driving me crazy on this one. I have searched everywhere and read man, docs and every related post Google can find and still no luck. The actual files I need to run this on are sensitive in nature, but it is the same thing as if I needed to calculate weighted grades for multiple... (15 Replies)
In the file below I am trying to count the given repeats of A,T,C,G in each string of letters. Each sequence is below the > and it is possible for a string of repeats to wrap from the line above. For example, in the first line the last letter is a T and the next lines has 3 more. I think the below... (10 Replies)
The below awk executes as is and produces the current output. It isvery close but what Ican not seem to do is add the -exon..., the ... portion comes from $1 and the _exon is static and will never change. If there is + sign in $4 then the ... is in acending order or sequential. If there is a - in... (2 Replies)
In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of
HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN=.
Hopefully the awk is a start but I... (2 Replies)
Trying to output a result that uses the data from file to combine and subtract specific lines. If $4 matches in each line then the last $6 value is added to $2 and that becomes the new$3. Each matching line in combined into one with $1 then the original $2 then the new$3 then $5. For the cases... (4 Replies)
Hi experts, I've been struggling to format a large genetic dataset. It's complicated to explain so I'll simply post example input/output
$cat input.txt
ID GENE pos start end
blah1 coolgene 1 3 5
blah2 coolgene 1 4 6
blah3 coolgene 1 4 ... (4 Replies)
First, thanks for the help in previous posts... couldn't have gotten where I am now without it!
So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following:
If $1... (4 Replies)
Hi, all
I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes.
sample input:
for this line, 5 fields are supposed to be extracted, they... (8 Replies)
So, I need to do some summing. I have an Apache log file with the following as a typical line:
127.0.0.1 - frank "GET /apache_pb.gif HTTP/1.0" 200 2326
Now, what I'd like to do is a per-minute sum. So, I can have awk tell me the individual minutes, preserving the dates(since this is a... (7 Replies)
I want to find the top N entries for a certain field based on the values of another field.
For example if N=3, we want the 3 best values for each entry:
Entry1 ||| 100
Entry1 ||| 95
Entry1 ||| 30
Entry1 ||| 80
Entry1 ||| 50
Entry2 ||| 40
Entry2 ||| 20
Entry2 ||| 10
Entry2 ||| 50... (1 Reply)