Perl to extract values and print at end of each line


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Perl to extract values and print at end of each line

In the below perl I am trying to extract and print the values AF1=, the GT value, and F[5] or QUAL diveded by 33 (rounded to the nearest whole #). The GT value is at the end after the GT:PL so all the possibilities are read into a hash h, then depending on the value that is in the line the coresponding het ot hom results. The command does execute but the file is not changed. I know many of the regex in the command are not used but I need to keep the same format (this is a special case scenario). Thank you Smilie.



file
Code:
##bcftools_callVersion=1.8+htslib-1.8
##bcftools_callCommand=call -c -; Date=Thu May 16 12:25:46 2019
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxx
chr1	235972435	.	T	C	195.009	.	DP=744;VDB=0.27972;SGB=-0.693147;RPB=0.506757;MQB=0.997295;MQSB=0.616175;BQB=0.810181;MQ0F=0;AF1=0.5;AC1=1;DP4=218,151,192,179;MQ=58;FQ=188.502;PV4=0.0462442,1,0.32459,1	GT:PL	0/1:225,0,216
chr2	220022998	.	G	.	283.236	.	DP=456;SGB=-0.379885;RPB=1;MQB=1;MQSB=0.840495;BQB=1;MQ0F=0;AF1=0;AC1=0;DP4=245,209,1,0;MQ=59;FQ=-281.989;PV4=1,0.0010322,1,4.68871e-09	GT:PL	0/0:0
chr3	128205860	.	G	C	221.999	.	DP=353;VDB=0.00961268;SGB=-0.693147;RPB=1;MQB=1;MQSB=0.685507;BQB=1;MQ0F=0;AF1=1;AC1=2;DP4=0,1,147,176;MQ=57;FQ=-281.989;PV4=1,1,1,0.30647	GT:PL	1/1:255,255,0
chr5	35871190	.	G	A	170.009	.	DP=173;VDB=0.000529499;SGB=-0.693147;RPB=0.605358;MQB=0.998833;MQSB=0.99703;BQB=0.113906;MQ0F=0;AF1=0.5;AC1=1;DP4=41,52,35,39;MQ=55;FQ=172.502;PV4=0.754883,1,1,1	GT:PL	0/1:200,0,209
chr5	77412011	.	A	G	161.009	.	DP=293;VDB=0.0010866;SGB=-0.693147;RPB=0.990047;MQB=0.923303;MQSB=0.967232;BQB=0.000232632;MQ0F=0;AF1=0.5;AC1=1;DP4=83,74,71,50;MQ=57;FQ=164.015;PV4=0.394406,2.23136e-05,1,1	GT:PL	0/1:191,0,225
chr7	66459269	.	C	.	136.006	.	DP=316;MQSB=0.314206;MQ0F=0.00316456;AF1=0;AC1=0;DP4=122,176,0,0;MQ=12;FQ=-281.989	GT:PL	0/0:0

perl
Code:
perl -plae ' 
 BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) } /^[^#].*AF1=(\d+\.?\d*);.*FDP=(\d+);.*FSAF=(\d+);.*FSAR=(\d+);.*HRUN=(\d+);.*STB=(\d+\.?\d*)[,;].*([0-2]\/[0-2])/ and $_ .= join "\t", ("",  sprintf("%1.3f",$9), $h{$7}, int($F[5]/33+0.5))' file

desired tab-delimited
Code:
##bcftools_callVersion=1.8+htslib-1.8
##bcftools_callCommand=call -c -; Date=Thu May 16 12:25:46 2019
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxx
chr1	235972435	.	T	C	195.009	.	DP=744;VDB=0.27972;SGB=-0.693147;RPB=0.506757;MQB=0.997295;MQSB=0.616175;BQB=0.810181;MQ0F=0;AF1=0.5;AC1=1;DP4=218,151,192,179;MQ=58;FQ=188.502;PV4=0.0462442,1,0.32459,1	GT:PL	0/1:225,0,216	0.50	het	6
chr2	220022998	.	G	.	283.236	.	DP=456;SGB=-0.379885;RPB=1;MQB=1;MQSB=0.840495;BQB=1;MQ0F=0;AF1=0;AC1=0;DP4=245,209,1,0;MQ=59;FQ=-281.989;PV4=1,0.0010322,1,4.68871e-09	GT:PL	0/0:0	0.00	hom	9
chr3	128205860	.	G	C	221.999	.	DP=353;VDB=0.00961268;SGB=-0.693147;RPB=1;MQB=1;MQSB=0.685507;BQB=1;MQ0F=0;AF1=1;AC1=2;DP4=0,1,147,176;MQ=57;FQ=-281.989;PV4=1,1,1,0.30647	GT:PL	1/1:255,255,0	1.00	hom	7
chr5	35871190	.	G	A	170.009	.	DP=173;VDB=0.000529499;SGB=-0.693147;RPB=0.605358;MQB=0.998833;MQSB=0.99703;BQB=0.113906;MQ0F=0;AF1=0.5;AC1=1;DP4=41,52,35,39;MQ=55;FQ=172.502;PV4=0.754883,1,1,1	GT:PL	0/1:200,0,209	0.50	het	5
chr5	77412011	.	A	G	161.009	.	DP=293;VDB=0.0010866;SGB=-0.693147;RPB=0.990047;MQB=0.923303;MQSB=0.967232;BQB=0.000232632;MQ0F=0;AF1=0.5;AC1=1;DP4=83,74,71,50;MQ=57;FQ=164.015;PV4=0.394406,2.23136e-05,1,1	GT:PL	0/1:191,0,225
chr7	66459269	.	C	.	136.006	.	DP=316;MQSB=0.314206;MQ0F=0.00316456;AF1=0;AC1=0;DP4=122,176,0,0;MQ=12;FQ=-281.989	GT:PL	0/0:0	0.50	hom	4


Last edited by cmccabe; 6 Days Ago at 03:49 PM.. Reason: fixed format
# 2  
Your code regex part doesn't have reference for GT:PL value and need to modify the regex part.
Try the below code and you can add more regex based on your needs.

Code:
 perl -plane ' BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) } /^[^#].*AF1=(\d+\.?\d*);.*GT:PL\s*([^:]+):/ and $_ .= join "\t", ("",  sprintf("%1.2f",$1), $h{$2}, int($F[5]/33+0.5))' file

Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Extract values from a specific column to the end
prvnrk
Hello friends, I have a text file with many columns (no. columns vary from row to row) separated by space. I need to collect all the values from 18th column to the end from each line and group them as pairs and then numbering like below.. 1. 18th-col-value 19th-col-value 2. 20th-col-value ...... Shell Programming and Scripting
5
Shell Programming and Scripting
How to extract text from STRING to end of line?
Manchesterpaul
Hi I have a very large data file with several hundred columns and millions of lines. The important data is in the last set of columns with variable numbers of tab delimited fields in front of it on each line. Im currently trying sed to get the data out - I want anything beetween :RES and...... Shell Programming and Scripting
4
Shell Programming and Scripting
Extract key words and print their values
buptwy
Input file (HTTP request log file): GET...... Shell Programming and Scripting
2
Shell Programming and Scripting
Print required values at end of the file by using AWK
dvrbabu
I am looking help in awk, quick overview. we will get feed from external system . The input file looks like below. Detail Id Info Id Order Id STATUS Status Date FileDetail 99127942 819718 CMOG223481502 PR 04-17-2011 06:01:34PM...... Shell Programming and Scripting
7
Shell Programming and Scripting
extract a particular start and end pattern from a line
manish205
hi In the foll example the whole text in a single line.... i want to extract text from IPTel to RTCPBase.h. want to use this acrooss the whole file Updated: IPTel\platform\core\include\RTCPBase.h \main\MWS2051_Sablime_Int\1...... Shell Programming and Scripting
7
Shell Programming and Scripting