Sponsored Content
Top Forums Shell Programming and Scripting awk to change value in field according to another Post 303025682 by cmccabe on Friday 9th of November 2018 08:43:35 AM
Old 11-09-2018
awk to change value in field according to another

I am trying to use awk to check if each $2 in file1 falls between $2 and $3 of the matching $4 line of file2. If it does then in $5 of file2, exon if it does not intron. I think the awk below will do that, but I am struggling trying to is add a calculation that if the difference is less than 10, then $5 is splicing. I have added an example of line 1 as well.

The 5th line is an example of the splicing, because the $2 value in file1 is 2 away from the $2 value in file2. Thank you Smilie.

file1
Code:
chr1	17345304	17345315 	SDHB	
chr1	17345516	17345524 	SDHB	
chr1	93306242	93306261 	RPL5	
chr1	93307262	93307291 	RPL5
chrx	153295819	153296875 	MECP2	
chrx	153295810	153296800 	MECP2

file2 tab-delimeted
Code:
chr1	17345375	17345453	SDHB_cds_0_0_chr1_17345376_r	0	-
chr1	17349102	17349225	SDHB_cds_1_0_chr1_17349103_r	0	-
chr1	17350467	17350569	SDHB_cds_2_0_chr1_17350468_r	0	-
chr1	17354243	17354360	SDHB_cds_3_0_chr1_17354244_r	0	-
chr1	17355094	17355231	SDHB_cds_4_0_chr1_17355095_r	0	-
chr1	17359554	17359640	SDHB_cds_5_0_chr1_17359555_r	0	-
chr1	17371255	17371383	SDHB_cds_6_0_chr1_17371256_r	0	-
chr1	17380442	17380514	SDHB_cds_7_0_chr1_17380443_r	0	-
chr1	93297671	93297674	RPL5_cds_0_0_chr1_93297672_f	0	+
chr1	93298945	93299015	RPL5_cds_1_0_chr1_93298946_f	0	+
chr1	93299101	93299217	RPL5_cds_2_0_chr1_93299102_f	0	+
chr1	93300335	93300470	RPL5_cds_3_0_chr1_93300336_f	0	+
chr1	93301746	93301949	RPL5_cds_4_0_chr1_93301747_f	0	+
chr1	93303012	93303190	RPL5_cds_5_0_chr1_93303013_f	0	+
chr1	93306107	93306196	RPL5_cds_6_0_chr1_93306108_f	0	+
chr1	93307322	93307422	RPL5_cds_7_0_chr1_93307323_f	0	+
chrX	153295817	153296901	MECP2_cds_0_0_chrX_153295818_r	0	-
chrX	153297657	153298008	MECP2_cds_1_0_chrX_153297658_r	0	-
chrX	153357641	153357667	MECP2_cds_2_0_chrX_153357642_r	0	-

desired output tab-delimited
Code:
chr1	17345304	17345315 	SDHB	intron
chr1	17345516	17345524 	SDHB	intron	
chr1	93306242	93306261 	RPL5	intron	
chr1	93307262	93307291 	RPL5	intron
chrx	153295819	153296875	MECP2	exon
chrx	153295810	153296800	MECP2	splicing

awk
Code:
awk '
FNR==NR{
  a[$4];
  min[$4]=$2;
  max[$4]=$3;
  next
}
{
  split($4,array,"_");
  print $0,(array[1] in a) && ($2>=min[array[1]] && $2<=max[array[1]])?"exon":"intron"
}
' file1 OFS="\t" file2 > output

example of line 1
Code:
a[$4] = SDHB
min[$4] = 17345304
max[$4] = 17345315

array[1] = SDHB, 17345304 >= 17345375 && array[1] = SDHB, 17345315 <= 17345453 ---- intron

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

change field content awk

I have a line like this: I want to move HTTP/1.1 200 OK to the next line and put a blank line between the two lines i.e. How can i get it using awk? Thanks in advance (2 Replies)
Discussion started by: littleboyblu
2 Replies

2. Shell Programming and Scripting

dynamically change awk Field Separator FS

Hi All, I was wondering if anyone knew how to dynamically change the FS in awk to accept vairiable containing a field separator. the current code is as below and does not work when i introduce the dynamic FS change :-( validate_source_file() { source_file=$1 ... (2 Replies)
Discussion started by: satnamx
2 Replies

3. Shell Programming and Scripting

awk,cut fields by change field format

Hi Everyone, # cat 1.txt 1321631,77770132976455,19,20091001011859,20091001011907 1321631,77770132976455,19,20091001011859,20091001011907 1321631,77770132976455,19,20091001011859,20091001011907 # cat 1.txt | awk -F, '{OFS=",";print $1,$3,$4,$5}' 1321631,19,20091001011859,20091001011907... (7 Replies)
Discussion started by: jimmy_y
7 Replies

4. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

5. Shell Programming and Scripting

AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2

First, thanks for the help in previous posts... couldn't have gotten where I am now without it! So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following: If $1... (4 Replies)
Discussion started by: right_coaster
4 Replies

6. Shell Programming and Scripting

awk or sed? change field conditional on key match

Hi. I'd appreciate if I can get some direction in this issue to get me going. Datafile1: -About 4000 records, I have to update field#4 in selected records based on a match in the key field (Field#1). -Field #1 is the key field (servername) . # of Fields may vary # comment server1 bbb ccc... (2 Replies)
Discussion started by: RascalHoudi
2 Replies

7. UNIX for Dummies Questions & Answers

change field separator only from nth field until NF

Hi ! input: 111|222|333|aaa|bbb|ccc 999|888|777|nnn|kkk 444|666|555|eee|ttt|ooo|ppp With awk, I am trying to change the FS "|" to "; " only from the 4th field until the end (the number of fields vary between records). In order to get: 111|222|333|aaa; bbb; ccc 999|888|777|nnn; kkk... (1 Reply)
Discussion started by: beca123456
1 Replies

8. Shell Programming and Scripting

awk :how to change delimiter without giving all field name

Hi Experts, i need to change delimiter from tab to "," sample test file cat test A0000368 A29938511 072569352 5 Any 2 for £1.00 BUTCHERS|CAT FOOD|400G Sep 12 2012 12:00AM Jan 5 2014 11:59PM Sep 7 2012 12:00AM M 2.000 group 5 ... (2 Replies)
Discussion started by: Lakshman_Gupta
2 Replies

9. Shell Programming and Scripting

awk to change value of field using multiple conditions

In the below awk in the first step I default Classification NF-1 to VUS. Next, I am trying to change the value of Classification (NF) to whatever CLINSIG (NF-1) is. If there is only one condition everything works great, but if there are two conditions it does not work. Is the syntax used... (4 Replies)
Discussion started by: cmccabe
4 Replies

10. Shell Programming and Scripting

awk to change contents of field based on condition in same file

In the awk below I am trying to copy the entire contents of $6 there may be multiple values seperated by a ;, to $8, if $8 is . (lines 1 and 3 are examples). If that condition $8 is not . (line2 is an example) then that line is skipped and printed as is. The awk does execute but prints the output... (3 Replies)
Discussion started by: cmccabe
3 Replies
All times are GMT -4. The time now is 07:20 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy