Unix/Linux Go Back    


Shell Programming and Scripting BSD, Linux, and UNIX shell scripting — Post awk, bash, csh, ksh, perl, php, python, sed, sh, shell scripts, and other shell scripting languages questions here.

Perl to extract whole number or decimal in regex

Shell Programming and Scripting


Tags
perl, solved

Reply    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 1 Week Ago
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 17 October 2017, 1:27 PM EDT
Location: Chicago
Posts: 1,182
Thanks: 709
Thanked 14 Times in 13 Posts
Perl to extract whole number or decimal in regex

In the perl below I am trying to extract and print specic values from patterns using multiple regex. One of the patterns AF= may be a whole number or a decimal but I can not seem
to capture both. I think it is the regex .*AF=(\d+\.\d+); as it is expecting a #.#### and it may only be a #. I tried changing it to ^\d*\.?\d*$ but that returned all four lines as is with
the last 8 fields empty. Thank you Linux.

file

Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1

perl

Code:
perl -plae ' 
 BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) } /^[^#].*AF=(\d+\.\d+);.*FDP=(\d+);.*FSAF=(\d+);.*FSAR=(\d+);.*HRUN=(\d+);.*STB=(\d+\.\d+)[,;].*([0-2]\/[0-2])/ and $_ .= join "\t", ("",   sprintf("%1.3f",$1),$3,$4,$5, ($6 >= 0.8 ? "STRANDBIAS" : "GOOD"), $2, $h{$7}, int($F[5]/33+0.5))' file

current output

Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1	0.570	46	44	1	GOOD	158	het	14
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1	0.517	102	105	1	GOOD	400	het	28

desired result

Code:
##INFO=<ID=OREF,Number=.,Type=String,Description="List of original reference bases">
##INFO=<ID=OALT,Number=.,Type=String,Description="List of original variant bases">
##INFO=<ID=OMAPALT,Number=.,Type=String,Description="Maps OID,OPOS,OREF,OALT entries to specific ALT alleles">
##deamination_metric=0.184377
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	xxxxx
chr1	10327407	.	A	T	447.753	PASS	AF=0.56962;AO=90;DP=158;FAO=90;FDP=158;FR=.;FRO=68;FSAF=46;FSAR=44;FSRF=29;FSRR=39;FWDB=0.0102227;FXX=0;HRUN=1;LEN=1;MLLD=75.1519;OALT=T;OID=.;OMAPALT=T;OPOS=10327407;OREF=A;PB=.;PBP=.;QD=11.3355;RBI=0.0341128;REFB=-0.00773916;REVB=0.032545;RO=67;SAF=46;SAR=44;SRF=28;SRR=39;SSEN=0;SSEP=0;SSSB=0.0730222;STB=0.536379;STBP=0.323;TYPE=snp;VARB=0.00446437	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:246:158:158:67:68:90:90:0.56962:44:46:28:39:44:46:29:39:1    0.570	46	44	1	GOOD	158	het	14
chr1	948846	.	T	TA	786.915	PASS	AF=1;AO=139;DP=159;FAO=152;FDP=152;FR=.;FRO=0;FSAF=79;FSAR=73;FSRF=0;FSRR=0;FWDB=-0.0705752;FXX=0.0256394;HRUN=1;LEN=1;MLLD=19.5287;OALT=A;OID=.;OMAPALT=TA;OPOS=948847;OREF=-;PB=.;PBP=.;QD=20.7083;RBI=0.0969354;REFB=0.149002;REVB=0.0664501;RO=11;SAF=67;SAR=72;SRF=9;SRR=2;SSEN=0;SSEP=0;SSSB=-0.0467166;STB=0.5;STBP=1;TYPE=ins;VARB=-0.00147685	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:153:159:152:11:0:139:152:1:72:67:9:2:73:79:0:0:1     1     79     73     1     GOOD     152     hom     24
chr1	9162051	.	C	T	1642.03	PASS	AF=1;AO=172;DP=172;FAO=172;FDP=172;FR=.;FRO=0;FSAF=87;FSAR=85;FSRF=0;FSRR=0;FWDB=0.00602093;FXX=0;HRUN=1;LEN=1;MLLD=174.212;OALT=T;OID=.;OMAPALT=T;OPOS=9162051;OREF=C;PB=.;PBP=.;QD=38.1868;RBI=0.0184042;REFB=0;REVB=-0.0173914;RO=0;SAF=87;SAR=85;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=5.71345e-08;STB=0.5;STBP=1;TYPE=snp;VARB=-9.55704e-06	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	1/1:78:172:172:0:0:172:172:1:85:87:0:0:85:87:0:0:1     1     87     85     1     GOOD     172     hom     50
chr1	9305445	.	A	C	911.07	PASS	AF=0.5175;AO=302;DP=574;FAO=207;FDP=400;FR=.,REALIGNEDx0.5175;FRO=193;FSAF=102;FSAR=105;FSRF=109;FSRR=84;FWDB=0.00317233;FXX=0;HRUN=1;LEN=1;MLLD=73.0922;OALT=C;OID=.;OMAPALT=C;OPOS=9305445;OREF=A;PB=.;PBP=.;QD=9.1107;RBI=0.00738387;REFB=0.00272197;REVB=0.00666767;RO=269;SAF=157;SAR=145;SRF=153;SRR=116;SSEN=0;SSEP=0;SSSB=-0.0424848;STB=0.534715;STBP=0.145;TYPE=snp;VARB=-0.00235714	GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR:QT	0/1:782:574:400:269:193:302:207:0.5175:145:157:153:116:105:102:109:84:1     0.517	102	105	1	GOOD	400	het	28


Last edited by cmccabe; 1 Week Ago at 09:02 AM.. Reason: fixed format
Sponsored Links
    #2  
Old Unix and Linux 1 Week Ago
Scrutinizer's Unix or Linux Image
Scrutinizer Scrutinizer is online now Forum Staff  
Moderator
 
Join Date: Nov 2008
Last Activity: 22 October 2017, 10:11 PM EDT
Location: Amsterdam
Posts: 11,575
Thanks: 510
Thanked 3,356 Times in 2,960 Posts
Hi, try:

Code:
\d+\.?\d*

The Following User Says Thank You to Scrutinizer For This Useful Post:
cmccabe (5 Days Ago)
Sponsored Links
    #3  
Old Unix and Linux 5 Days Ago
cmccabe cmccabe is offline
Registered User
 
Join Date: Nov 2013
Last Activity: 17 October 2017, 1:27 PM EDT
Location: Chicago
Posts: 1,182
Thanks: 709
Thanked 14 Times in 13 Posts
Thank you very much Linux.
Sponsored Links
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Need to extract only decimal numbers for a glob of text graysky Shell Programming and Scripting 5 12-24-2012 07:53 AM
Perl extract number from file & write to file austinj Shell Programming and Scripting 6 12-02-2011 11:45 AM
Add zero in decimal number sandy1028 Shell Programming and Scripting 2 07-12-2010 03:49 AM
Perl REGEX - How do extract a string in a line? maverick-ski Shell Programming and Scripting 4 08-20-2009 05:54 PM



All times are GMT -4. The time now is 10:28 PM.