Perl to extract values and print at end of each line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl to extract values and print at end of each line
# 1  
Old 05-16-2019
Perl to extract values and print at end of each line

In the below perl I am trying to extract and print the values AF1=, the GT value, and F[5] or QUAL diveded by 33 (rounded to the nearest whole #). The GT value is at the end after the GT:PL so all the possibilities are read into a hash h, then depending on the value that is in the line the coresponding het ot hom results. The command does execute but the file is not changed. I know many of the regex in the command are not used but I need to keep the same format (this is a special case scenario). Thank you Smilie.



file
Code:
##bcftools_callVersion=1.8+htslib-1.8
##bcftools_callCommand=call -c -; Date=Thu May 16 12:25:46 2019
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxx
chr1	235972435	.	T	C	195.009	.	DP=744;VDB=0.27972;SGB=-0.693147;RPB=0.506757;MQB=0.997295;MQSB=0.616175;BQB=0.810181;MQ0F=0;AF1=0.5;AC1=1;DP4=218,151,192,179;MQ=58;FQ=188.502;PV4=0.0462442,1,0.32459,1	GT:PL	0/1:225,0,216
chr2	220022998	.	G	.	283.236	.	DP=456;SGB=-0.379885;RPB=1;MQB=1;MQSB=0.840495;BQB=1;MQ0F=0;AF1=0;AC1=0;DP4=245,209,1,0;MQ=59;FQ=-281.989;PV4=1,0.0010322,1,4.68871e-09	GT:PL	0/0:0
chr3	128205860	.	G	C	221.999	.	DP=353;VDB=0.00961268;SGB=-0.693147;RPB=1;MQB=1;MQSB=0.685507;BQB=1;MQ0F=0;AF1=1;AC1=2;DP4=0,1,147,176;MQ=57;FQ=-281.989;PV4=1,1,1,0.30647	GT:PL	1/1:255,255,0
chr5	35871190	.	G	A	170.009	.	DP=173;VDB=0.000529499;SGB=-0.693147;RPB=0.605358;MQB=0.998833;MQSB=0.99703;BQB=0.113906;MQ0F=0;AF1=0.5;AC1=1;DP4=41,52,35,39;MQ=55;FQ=172.502;PV4=0.754883,1,1,1	GT:PL	0/1:200,0,209
chr5	77412011	.	A	G	161.009	.	DP=293;VDB=0.0010866;SGB=-0.693147;RPB=0.990047;MQB=0.923303;MQSB=0.967232;BQB=0.000232632;MQ0F=0;AF1=0.5;AC1=1;DP4=83,74,71,50;MQ=57;FQ=164.015;PV4=0.394406,2.23136e-05,1,1	GT:PL	0/1:191,0,225
chr7	66459269	.	C	.	136.006	.	DP=316;MQSB=0.314206;MQ0F=0.00316456;AF1=0;AC1=0;DP4=122,176,0,0;MQ=12;FQ=-281.989	GT:PL	0/0:0

perl
Code:
perl -plae ' 
 BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) } /^[^#].*AF1=(\d+\.?\d*);.*FDP=(\d+);.*FSAF=(\d+);.*FSAR=(\d+);.*HRUN=(\d+);.*STB=(\d+\.?\d*)[,;].*([0-2]\/[0-2])/ and $_ .= join "\t", ("",  sprintf("%1.3f",$9), $h{$7}, int($F[5]/33+0.5))' file

desired tab-delimited
Code:
##bcftools_callVersion=1.8+htslib-1.8
##bcftools_callCommand=call -c -; Date=Thu May 16 12:25:46 2019
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxx
chr1	235972435	.	T	C	195.009	.	DP=744;VDB=0.27972;SGB=-0.693147;RPB=0.506757;MQB=0.997295;MQSB=0.616175;BQB=0.810181;MQ0F=0;AF1=0.5;AC1=1;DP4=218,151,192,179;MQ=58;FQ=188.502;PV4=0.0462442,1,0.32459,1	GT:PL	0/1:225,0,216	0.50	het	6
chr2	220022998	.	G	.	283.236	.	DP=456;SGB=-0.379885;RPB=1;MQB=1;MQSB=0.840495;BQB=1;MQ0F=0;AF1=0;AC1=0;DP4=245,209,1,0;MQ=59;FQ=-281.989;PV4=1,0.0010322,1,4.68871e-09	GT:PL	0/0:0	0.00	hom	9
chr3	128205860	.	G	C	221.999	.	DP=353;VDB=0.00961268;SGB=-0.693147;RPB=1;MQB=1;MQSB=0.685507;BQB=1;MQ0F=0;AF1=1;AC1=2;DP4=0,1,147,176;MQ=57;FQ=-281.989;PV4=1,1,1,0.30647	GT:PL	1/1:255,255,0	1.00	hom	7
chr5	35871190	.	G	A	170.009	.	DP=173;VDB=0.000529499;SGB=-0.693147;RPB=0.605358;MQB=0.998833;MQSB=0.99703;BQB=0.113906;MQ0F=0;AF1=0.5;AC1=1;DP4=41,52,35,39;MQ=55;FQ=172.502;PV4=0.754883,1,1,1	GT:PL	0/1:200,0,209	0.50	het	5
chr5	77412011	.	A	G	161.009	.	DP=293;VDB=0.0010866;SGB=-0.693147;RPB=0.990047;MQB=0.923303;MQSB=0.967232;BQB=0.000232632;MQ0F=0;AF1=0.5;AC1=1;DP4=83,74,71,50;MQ=57;FQ=164.015;PV4=0.394406,2.23136e-05,1,1	GT:PL	0/1:191,0,225
chr7	66459269	.	C	.	136.006	.	DP=316;MQSB=0.314206;MQ0F=0.00316456;AF1=0;AC1=0;DP4=122,176,0,0;MQ=12;FQ=-281.989	GT:PL	0/0:0	0.50	hom	4


Last edited by cmccabe; 05-16-2019 at 04:49 PM.. Reason: fixed format
# 2  
Old 05-17-2019
Your code regex part doesn't have reference for GT:PL value and need to modify the regex part.
Try the below code and you can add more regex based on your needs.

Code:
 perl -plane ' BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) } /^[^#].*AF1=(\d+\.?\d*);.*GT:PL\s*([^:]+):/ and $_ .= join "\t", ("",  sprintf("%1.2f",$1), $h{$2}, int($F[5]/33+0.5))' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract values from a specific column to the end

Hello friends, I have a text file with many columns (no. columns vary from row to row) separated by space. I need to collect all the values from 18th column to the end from each line and group them as pairs and then numbering like below.. 1. 18th-col-value 19th-col-value 2. 20th-col-value ... (5 Replies)
Discussion started by: prvnrk
5 Replies

2. Shell Programming and Scripting

How to extract text from STRING to end of line?

Hi I have a very large data file with several hundred columns and millions of lines. The important data is in the last set of columns with variable numbers of tab delimited fields in front of it on each line. Im currently trying sed to get the data out - I want anything beetween :RES and... (4 Replies)
Discussion started by: Manchesterpaul
4 Replies

3. Shell Programming and Scripting

Extract values from Perl variable

Hi Guys, I am stuck in a problem. I have a variable in Perl script which has value for example X=a-b-c; Now, I want to extract a b c separately into different 3 variables. I know this can be done in shell using awk but Perl behaves a bit different. Can anybody help me on this please?... (3 Replies)
Discussion started by: prashant2507198
3 Replies

4. Shell Programming and Scripting

Extract key words and print their values

Input file (HTTP request log file): GET... (2 Replies)
Discussion started by: buptwy
2 Replies

5. Shell Programming and Scripting

End of each line count the values in the file

Hi, How to find the end of character in the file. My requirement should be as below.1 is repeating 1 time ,2 is repeating 3 times... type 1: 1 type 2: 3 type 3: 2 9f680177|20077337258|0|0|0|1000004647916|1 9f680177|20077337258|0|0|0|1000004647916|2 9f680177 20077337258 0 0 0... (5 Replies)
Discussion started by: bmk
5 Replies

6. Shell Programming and Scripting

Print required values at end of the file by using AWK

I am looking help in awk, quick overview. we will get feed from external system . The input file looks like below. Detail Id Info Id Order Id STATUS Status Date FileDetail 99127942 819718 CMOG223481502 PR 04-17-2011 06:01:34PM... (7 Replies)
Discussion started by: dvrbabu
7 Replies

7. Shell Programming and Scripting

Extract X words from end of line, minus last keynumber X

The file contains one line of text followed by a number. I want to take the number X at the end, take it out and display the last X words. X is the key telling me how many words from the end that I want and X will always be less than the number of words, so no problem there. Example input and... (4 Replies)
Discussion started by: fubaya
4 Replies

8. Shell Programming and Scripting

extract values from column with Perl

Hi everybody I have some problems with PERL programming. I have a file with two columns, both with numeric values. I have to extract the values > 50 from the 2nd columns and sum them among them. The I have to sum the respective values in the first column on the same line and, at the end, I... (6 Replies)
Discussion started by: m_elena
6 Replies

9. UNIX for Dummies Questions & Answers

Using sed to extract a substring at end of line

This is the line that I am using: sed 's/^*\({3}*$\)/\1 /' <test.txt >results.txt and suppose that test.txt contains the following lines: http://www.example.com/200904/AUS.txt http://www.example.com/200903/_RUS.txt http://www.example.com/200902/.FRA.txt What I expected to see in results.txt... (6 Replies)
Discussion started by: figaro
6 Replies

10. Shell Programming and Scripting

extract a particular start and end pattern from a line

hi In the foll example the whole text in a single line.... i want to extract text from IPTel to RTCPBase.h. want to use this acrooss the whole file Updated: IPTel\platform\core\include\RTCPBase.h \main\MWS2051_Sablime_Int\1... (7 Replies)
Discussion started by: manish205
7 Replies
Login or Register to Ask a Question