awk to adjust text and count based on value in field


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to adjust text and count based on value in field
# 1  
Old 04-17-2018
awk to adjust text and count based on value in field

The below awk executes as is and produces the current output. It isvery close but what Ican not seem to do is add the -exon..., the ... portion comes from $1 and the _exon is static and will never change. If there is + sign in $4 then the ... is in acending order or sequential. If there is a - in $4 then the order is descending or in reverse. I think I need and if statement but not sure how to increment or subtract the value correctly. Thank you Smilie.

example of ordering based on $4
Code:
+ = exon 1,2,3
- = exon 3,2,1

file tab-delimited
Code:
208	NR_120664.1	chr5	+	141704857	141843619	141843619	141843619	4	141704857,141724980,141732790,141843534,	141704935,141725050,141733148,141843619,	0	SPRY4-AS1	unk	unk	-1,-1,-1,-1,
1161	NM_021615.4	chr16	-	75507021	75528926	75512538	75513726	3	75507021,75515714,75528837,	75513742,75515789,75528926,	0	CHST6	cmpl	cmpl	0,-1,-1,
1799	NM_002036.3	chr1	+	159173802	159176290	159174749	159176240	2	159173802,159175250,	159174770,159176290,	0	ACKR1	cmpl	cmpl	0,0,

current output tab-delimited
Code:
4	+	SPRY4-AS1	NR_120664.1	chr5:141704857-141704935     chr5:141724980-141725050     chr5:141732790-141733148     chr5:141843534-141843619     
3	-	CHST6	NM_021615.4	chr16:75507021-75513742     chr16:75515714-75515789     chr16:75528837-75528926     
2	+	ACKR1	NM_002036.3	chr1:159173802-159174770     chr1:159175250-159176290

desired output tab-delimited
Code:
4	+	SPRY4-AS1	NR_120664.1	chr5:141704857-141704935_exon1,chr5:141724980-141725050_exon2,chr5:141732790-141733148_exon3,chr5:141843534-141843619_exon4
3	-	CHST6	NM_021615.4	chr16:75507021-75513742_exon3,chr16:75515714-75515789_exon2,chr16:75528837-75528926_exon1
2	+	ACKR1	NM_002036.3	chr1:159173802-159174770_exon1	chr1:159175250-159176290_exon2

awk
Code:
awk -F '\t' '{sf="";len1=split($10,s1,",");split($11,s2,","); for (i=1;i<len1;i++){sf=sf $3":"s1[i]"-"s2[i]"     "}print $9,$4,$13,$2,sf}' OFS='\t' file > out


Last edited by cmccabe; 04-17-2018 at 07:09 PM.. Reason: fixed format
# 2  
Old 04-17-2018
Code:
BEGIN {
  FS=OFS="\t"
  suf="_exon"
}
{
   sf=""
   len1=split($10,s1,",")
   split($11,s2,",")
   for (i=1;i<len1;i++)
     sf=sf $3 ":" s1[i] "-" s2[i] suf (($4=="+")?i:len1-i) ","
   print $9,$4,$13,$2,sf
}

This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 04-18-2018
Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Problem with getting awk to multiply a field by a value set based on condition of another field

Hi, So awk is driving me crazy on this one. I have searched everywhere and read man, docs and every related post Google can find and still no luck. The actual files I need to run this on are sensitive in nature, but it is the same thing as if I needed to calculate weighted grades for multiple... (15 Replies)
Discussion started by: cotilloe
15 Replies

2. Shell Programming and Scripting

awk to print lines based on text in field and value in two additional fields

In the awk below I am trying to print the entire line, along with the header row, if $2 is SNV or MNV or INDEL. If that condition is met or is true, and $3 is less than or equal to 0.05, then in $7 the sub pattern :GMAF= is found and the value after the = sign is checked. If that value is less than... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Shell Programming and Scripting

awk to adjust coordinates in field based on sequential numbers in another field

I am trying to output a tab-delimited result that uses the data from a tab-delimited file to combine and subtract specific lines. If $4 matches in each line then the first matching sequential $6 value is added to $2, unless the value is 1, then the original $2 is used (like in the case of line... (3 Replies)
Discussion started by: cmccabe
3 Replies

4. Shell Programming and Scripting

awk to update value in field based on another field

In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN=. Hopefully the awk is a start but I... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. Shell Programming and Scripting

awk joining multiple lines based on field count

Hi Folks, I have a file with fields as follows which has last field in multiple lines. I would like to combine a line which has three fields with single field line for as shown in expected output. Please help. INPUT hname01 windows appnamec1eda_p1, ... (5 Replies)
Discussion started by: shunya
5 Replies

6. Shell Programming and Scripting

awk to combine matches and use a field to adjust coordinates in other fields

Trying to output a result that uses the data from file to combine and subtract specific lines. If $4 matches in each line then the last $6 value is added to $2 and that becomes the new$3. Each matching line in combined into one with $1 then the original $2 then the new$3 then $5. For the cases... (4 Replies)
Discussion started by: cmccabe
4 Replies

7. Shell Programming and Scripting

awk to parse field and include the text of 1 pipe in field 4

I am trying to parse the input in awk to include the |gc= in $4 but am not able to. The below is close: awk so far: awk '{sub(/\|]+]++/, ""); print }' input.txt Input chr1 955543 955763 AGRN-6|pr=2|gc=75 0 + chr1 957571 957852 AGRN-7|pr=3|gc=61.2 0 + chr1 970621 ... (7 Replies)
Discussion started by: cmccabe
7 Replies

8. Shell Programming and Scripting

Read text between regexps and write into files based on a field in the text

Hi, I have a huge file that has data something like shown below: huge_file.txt start regexp Name=Name1 Title=Analyst Address=Address1 Department=Finance end regexp some text some text start regexp Name=Name2 Title=Controller Address=Address2 Department=Finance end regexp (7 Replies)
Discussion started by: r3d3
7 Replies

9. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

10. UNIX for Dummies Questions & Answers

awk - Summing a field based on another field

So, I need to do some summing. I have an Apache log file with the following as a typical line: 127.0.0.1 - frank "GET /apache_pb.gif HTTP/1.0" 200 2326 Now, what I'd like to do is a per-minute sum. So, I can have awk tell me the individual minutes, preserving the dates(since this is a... (7 Replies)
Discussion started by: treesloth
7 Replies
Login or Register to Ask a Question