Perl to extract whole number or decimal in regex


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl to extract whole number or decimal in regex
Prev   Next
# 1  
Old 10-12-2017
Perl to extract whole number or decimal in regex

In the perl below I am trying to extract and print specic values from patterns using multiple regex. One of the patterns AF= may be a whole number or a decimal but I can not seem
to capture both. I think it is the regex .*AF=(\d+\.\d+); as it is expecting a #.#### and it may only be a #. I tried changing it to ^\d*\.?\d*$ but that returned all four lines as is with
the last 8 fields empty. Thank you Smilie.

file
Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1

perl
Code:
perl -plae ' 
 BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) } /^[^#].*AF=(\d+\.\d+);.*FDP=(\d+);.*FSAF=(\d+);.*FSAR=(\d+);.*HRUN=(\d+);.*STB=(\d+\.\d+)[,;].*([0-2]\/[0-2])/ and $_ .= join "\t", ("",   sprintf("%1.3f",$1),$3,$4,$5, ($6 >= 0.8 ? "STRANDBIAS" : "GOOD"), $2, $h{$7}, int($F[5]/33+0.5))' file

current output
Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1	0.570	46	44	1	GOOD	158	het	14
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1	0.517	102	105	1	GOOD	400	het	28

desired result
Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1    0.570	46	44	1	GOOD	158	het	14
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1     1     79     73     1     GOOD     152     hom     24
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1     1     87     85     1     GOOD     172     hom     50
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1     0.517	102	105	1	GOOD	400	het	28


Last edited by cmccabe; 10-12-2017 at 10:02 AM.. Reason: fixed format
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language. Could you help me to understand the regular Perl expression: ^(?!if\b|else\b|while\b|)(?:+?\s+){1,6}(+\s*)\(*\) *?(?:^*;?+){0,10}\{ ------ This is regex to select functions from a C/C++ source and defined in... (2 Replies)
Discussion started by: alex_5161
2 Replies

2. Shell Programming and Scripting

Need to extract only decimal numbers for a glob of text

If you have a look at this thread, you'll see that users have been posting the output a script which are numbers that range from 2 to 5 decimal places. If I dump this entire thread to txt file, how can I: 1) Delete everything except for numbers of the following formats (where 'x' is a digit and... (5 Replies)
Discussion started by: graysky
5 Replies

3. Shell Programming and Scripting

Matching a decimal number?

Hi everyone! Easy question for everyone. I'm trying to run a command line to find an exact match of a decimal number within a file. The number can be a positive OR negative number. For instance, if I want to find only the number -1 in the file that has: -17.6 -17 -16.3 -16.2 -15.7 -15.3... (6 Replies)
Discussion started by: lucshi09
6 Replies

4. Shell Programming and Scripting

Perl extract number from file & write to file

I have 1 file that has elements as follows. Also the CVR(10) and the word "SAUCE" only appear once in the file so maybe a grep command would work? file1 CVR( 9) = 0.385E+05, ! VEHICLE CVR(10) = 0.246E+05, ! SAUCE CVR(11) = 0.162E+03, ! VEHICLE I need to extract the... (6 Replies)
Discussion started by: austinj
6 Replies

5. Shell Programming and Scripting

number of digits after decimal

Hi All, I have a file of decimal numbers, cat file1.txt 1.1382666907 1.2603107334 1.6118799297 24.4995857056 494.7632588468 560.7633734425 ..... I want to see the output as only 7 digits after decimal (5 Replies)
Discussion started by: senayasma
5 Replies

6. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

7. Shell Programming and Scripting

Add zero in decimal number

echo "scale=2; 282.73/640" | bc This will print .44 How to make the variable as 0.44 (2 Replies)
Discussion started by: sandy1028
2 Replies

8. Shell Programming and Scripting

Perl REGEX - How do extract a string in a line?

Hi Guys, In the following line: cn=portal.090710.191533.428571000,cn=groups,dc=mp,dc=rj,dc=gov,dc=br I need to extract this string: portal.090710.191533.428571000 As you can see this string always will be bettween "cn=" and "," strings. Someone know one regular expression to... (4 Replies)
Discussion started by: maverick-ski
4 Replies

9. Shell Programming and Scripting

Test decimal number

Hi, I would like test if a number is a decimal number or not (9 Replies)
Discussion started by: francis_tom
9 Replies
Login or Register to Ask a Question