Perl to extract whole number or decimal in regex


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl to extract whole number or decimal in regex
# 1  
Old 10-12-2017
Perl to extract whole number or decimal in regex

In the perl below I am trying to extract and print specic values from patterns using multiple regex. One of the patterns AF= may be a whole number or a decimal but I can not seem
to capture both. I think it is the regex .*AF=(\d+\.\d+); as it is expecting a #.#### and it may only be a #. I tried changing it to ^\d*\.?\d*$ but that returned all four lines as is with
the last 8 fields empty. Thank you Smilie.

file
Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1

perl
Code:
perl -plae ' 
 BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) } /^[^#].*AF=(\d+\.\d+);.*FDP=(\d+);.*FSAF=(\d+);.*FSAR=(\d+);.*HRUN=(\d+);.*STB=(\d+\.\d+)[,;].*([0-2]\/[0-2])/ and $_ .= join "\t", ("",   sprintf("%1.3f",$1),$3,$4,$5, ($6 >= 0.8 ? "STRANDBIAS" : "GOOD"), $2, $h{$7}, int($F[5]/33+0.5))' file

current output
Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1	0.570	46	44	1	GOOD	158	het	14
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1	0.517	102	105	1	GOOD	400	het	28

desired result
Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1    0.570	46	44	1	GOOD	158	het	14
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1     1     79     73     1     GOOD     152     hom     24
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1     1     87     85     1     GOOD     172     hom     50
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1     0.517	102	105	1	GOOD	400	het	28


Last edited by cmccabe; 10-12-2017 at 10:02 AM.. Reason: fixed format
# 2  
Old 10-12-2017
Hi, try:
Code:
\d+\.?\d*

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 10-17-2017
Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language. Could you help me to understand the regular Perl expression: ^(?!if\b|else\b|while\b|)(?:+?\s+){1,6}(+\s*)\(*\) *?(?:^*;?+){0,10}\{ ------ This is regex to select functions from a C/C++ source and defined in... (2 Replies)
Discussion started by: alex_5161
2 Replies

2. Shell Programming and Scripting

Need to extract only decimal numbers for a glob of text

If you have a look at this thread, you'll see that users have been posting the output a script which are numbers that range from 2 to 5 decimal places. If I dump this entire thread to txt file, how can I: 1) Delete everything except for numbers of the following formats (where 'x' is a digit and... (5 Replies)
Discussion started by: graysky
5 Replies

3. Shell Programming and Scripting

Matching a decimal number?

Hi everyone! Easy question for everyone. I'm trying to run a command line to find an exact match of a decimal number within a file. The number can be a positive OR negative number. For instance, if I want to find only the number -1 in the file that has: -17.6 -17 -16.3 -16.2 -15.7 -15.3... (6 Replies)
Discussion started by: lucshi09
6 Replies

4. Shell Programming and Scripting

Perl extract number from file & write to file

I have 1 file that has elements as follows. Also the CVR(10) and the word "SAUCE" only appear once in the file so maybe a grep command would work? file1 CVR( 9) = 0.385E+05, ! VEHICLE CVR(10) = 0.246E+05, ! SAUCE CVR(11) = 0.162E+03, ! VEHICLE I need to extract the... (6 Replies)
Discussion started by: austinj
6 Replies

5. Shell Programming and Scripting

number of digits after decimal

Hi All, I have a file of decimal numbers, cat file1.txt 1.1382666907 1.2603107334 1.6118799297 24.4995857056 494.7632588468 560.7633734425 ..... I want to see the output as only 7 digits after decimal (5 Replies)
Discussion started by: senayasma
5 Replies

6. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

7. Shell Programming and Scripting

Add zero in decimal number

echo "scale=2; 282.73/640" | bc This will print .44 How to make the variable as 0.44 (2 Replies)
Discussion started by: sandy1028
2 Replies

8. Shell Programming and Scripting

Perl REGEX - How do extract a string in a line?

Hi Guys, In the following line: cn=portal.090710.191533.428571000,cn=groups,dc=mp,dc=rj,dc=gov,dc=br I need to extract this string: portal.090710.191533.428571000 As you can see this string always will be bettween "cn=" and "," strings. Someone know one regular expression to... (4 Replies)
Discussion started by: maverick-ski
4 Replies

9. Shell Programming and Scripting

Test decimal number

Hi, I would like test if a number is a decimal number or not (9 Replies)
Discussion started by: francis_tom
9 Replies
Login or Register to Ask a Question