awk to adjust coordinates in field based on sequential numbers in another field


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 01-27-2017
awk to adjust coordinates in field based on sequential numbers in another field

I am trying to output a tab-delimited result that uses the data from a tab-delimited file to combine and subtract specific lines.

If $4 matches in each line then the first matching sequential $6 value is added to $2, unless the value is 1, then the original $2 is used (like in the case of line 1). This is the new or adjusted $2 value.

The last matching sequential $6 value is added to $2 and this is the new or adjusted $3 value.

The new $2 and $3 vales are combined with $1 in the format $1:$2-$3 and the $5 value is printed on the line.

The awk command below works great as long as the $4 values are unique, but that is not always the case. I can not seem to add in a condition that checks $6 and if the numbers are not sequential (1 2 is, but then there is a break between 92 93 94), when there is a break a new line is created.

Maybe there is another way but hopefully this helps. Thank you Smilie


Code:
chrX    110956442   110956535   chrX:110956442-110956535    ALG13   1   19
chrX    110956442   110956535   chrX:110956442-110956535    ALG13   2   19
chrX    110956442   110956535   chrX:110956442-110956535    ALG13   92  18
chrX    110956442   110956535   chrX:110956442-110956535    ALG13   93  18
chrX    110956442   110956535   chrX:110956442-110956535    ALG13   94  18
chrX    110961329   110961512   chrX:110961329-110961512    ALG13   2   1
chrX    110961329   110961512   chrX:110961329-110961512    ALG13   3   1
chr15    25031028    25031925    chrX:25031028-25031925  ARX 651 3

desired output
Code:
chrX:110956442-110956444    ALG13
chrX:110956534-110956536    ALG13
chrX:110961331-110961332    ALG13
chr15:25031679-25031679  ARX

awk
Code:
awk 'FNR==NR {S[$4]++;next} ($4 in S){if(S[$4]>1){print $1 OFS $2 OFS $2+S[$4] OFS $5;} 
else {if($6==1){print $1 OFS $2 OFS $2 OFS $5}
else {print $1 OFS $2+$6 OFS $2+$6 OFS $5}};delete S[$4]}' file file

current output
Code:
chrX 110956442 110956449 ALG13
chrX 110961329 110961334 ALG13
chr15 25031028 25031031 ARX


Last edited by cmccabe; 01-27-2017 at 11:58 AM.. Reason: fixed format
# 2  
Old 01-27-2017
If I run your awk on your input then I do not get your output.
But maybe I have understood your description.
If your file is sorted by $4 and $6 (so $6 sequences are in adjacent lines),
then the following can do it:
Code:
awk '
# print from stored values
function prt(){
  print p1 ":" (p6start==1 ? p2 : p2+p6start) "-" p2+p6, p5
}
($4!=p4 || $6!=p6+1) {
# new sequence, print the previous sequence
  if (NR>1) prt()
  p6start=$6  
}
{
# store the values that we need later
  p1=$1
  p2=$2
  p4=$4
  p5=$5
  p6=$6
}
END { prt() }
' file

A problem is the "late" end-of-sequence detection. This is solved with storing the previous values, and an END section, and a print function.
This User Gave Thanks to MadeInGermany For This Post:
cmccabe (01-30-2017)
# 3  
Old 01-28-2017
Are you sure the output should not be:
Code:
chrX:110956442-110956444    ALG13
chrX:110956532-110956535    ALG13
chrX:110961330-110961332    ALG13
chr15:25031678-25031678  ARX

That would make more sense to me, maybe I'm wrong..

Last edited by Scrutinizer; 01-28-2017 at 03:32 AM..
This User Gave Thanks to Scrutinizer For This Post:
cmccabe (01-30-2017)
# 4  
Old 01-30-2017
Thank you very much for your help and for catching the output correction, this is why the computer does the math Smilie.
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Perl to adjust coordinates based on repeat string cmccabe Shell Programming and Scripting 10 08-24-2018 08:35 AM
awk to adjust text and count based on value in field cmccabe Shell Programming and Scripting 2 04-18-2018 08:01 AM
Replacing field based on the value of other field weknowd Shell Programming and Scripting 14 01-12-2018 06:23 PM
awk to update field using matching value in file1 and substring in field in file2 cmccabe Shell Programming and Scripting 2 06-18-2017 07:38 AM
awk to update value in field based on another field cmccabe Shell Programming and Scripting 2 12-03-2016 10:52 AM
How can awk ignore the field delimiter like comma inside a field? gopal.biswal Shell Programming and Scripting 6 11-29-2016 05:49 AM
awk to combine matches and use a field to adjust coordinates in other fields cmccabe Shell Programming and Scripting 4 07-21-2016 10:37 AM
awk to parse field and include the text of 1 pipe in field 4 cmccabe Shell Programming and Scripting 7 11-07-2015 07:05 PM
awk repeat one field at all lines and modify field repetitions phaethon Shell Programming and Scripting 2 09-19-2015 07:29 PM
[Solved] awk solution to add sequential numbers based on a word torchij UNIX for Dummies Questions & Answers 4 07-08-2013 11:19 AM
Inserting a sequential number into a field on a flat file BristolSmithy UNIX for Dummies Questions & Answers 2 01-19-2012 05:55 PM
AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2 right_coaster Shell Programming and Scripting 4 10-06-2011 06:07 PM
awk, comma as field separator and text inside double quotes as a field. kevintse Shell Programming and Scripting 8 11-15-2010 05:31 PM
awk - Summing a field based on another field treesloth UNIX for Dummies Questions & Answers 7 06-21-2009 06:04 PM
Find top N values for field X based on field Y's value FrancoisCN Shell Programming and Scripting 1 05-29-2009 09:57 AM