Add specific string to last field of each line in perl based on value


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Add specific string to last field of each line in perl based on value
# 1  
Old 05-27-2016
Add specific string to last field of each line in perl based on value

I am trying to add a condition to the below perl that will capture the GTtag and place a specific string in the last field of each line. The problem is that the GT value used is not right after the tag rather it is a few fields away. The values should always be 0/1 or 1/2 and are in bold in the input. 0/1=het and 1/2=hom. Thank you Smilie.

input
Code:
##
##
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##
##
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
chr1    9324670    .    A    G    672.016    PASS    AF=0.528369;AO=148;DP=281;FAO=149;FDP=282;FR=.;FRO=133;FSAF=59;FSAR=90;FSRF=60;FSRR=73;FWDB=0.00343606;FXX=0;HRUN=1;LEN=1;MLLD=155.207;OALT=G;OID=.;OMAPALT=G;OPOS=9324670;OREF=A;PB=0.5;PBP=1;QD=9.53214;RBI=0.00594431;REFB=-0.0181827;REVB=0.00485061;RO=130;SAF=59;SAR=89;SRF=57;SRR=73;SSEN=0;SSEP=0;SSSB=-0.0352973;STB=0.526882;STBP=0.323;TYPE=snp;VARB=0.0184938;ANN=H6PD    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:527:281:282:130:133:148:149:0.528369:89:59:57:73:90:59:60:73 GOOD 282 reads
chr1    10318652    .    C    G    360.217    PASS    AF=0.566929;AO=72;DP=129;FAO=72;FDP=127;FR=.;FRO=55;FSAF=36;FSAR=36;FSRF=31;FSRR=24;FWDB=0.00760676;FXX=0.0155027;HRUN=2;LEN=1;MLLD=115.62;OALT=G;OID=.;OMAPALT=G;OPOS=10318652;OREF=C;PB=0.5;PBP=1;QD=11.3454;RBI=0.0125905;REFB=-0.0312889;REVB=-0.0100329;RO=55;SAF=36;SAR=36;SRF=31;SRR=24;SSEN=0;SSEP=0;SSSB=-0.0505108;STB=0.527551;STBP=0.492;TYPE=snp;VARB=0.0181889;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:203:129:127:55:55:72:72:0.566929:36:36:31:24:36:36:31:24 GOOD 127 reads
chr1    10355834    .    C    T    504.995    PASS    AF=0.456747;AO=132;DP=290;FAO=132;FDP=289;FR=.;FRO=157;FSAF=56;FSAR=76;FSRF=85;FSRR=72;FWDB=0.0634576;FXX=0.00344816;HRUN=2;LEN=1;MLLD=58.7971;OALT=T;OID=.;OMAPALT=T;OPOS=10355834;OREF=C;PB=0.5;PBP=1;QD=6.98956;RBI=0.0644815;REFB=0.00590624;REVB=-0.0114455;RO=156;SAF=56;SAR=76;SRF=85;SRR=71;SSEN=0;SSEP=0;SSSB=-0.118565;STB=0.563872;STBP=0.047;TYPE=snp;VARB=-0.00678859;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:504:290:289:156:157:132:132:0.456747:76:56:85:71:76:56:85:72 GOOD 289 reads
chr1    11090916    .    C    A    1855.57    PASS    AF=1;AO=193;DP=193;FAO=194;FDP=194;FR=.;FRO=0;FSAF=96;FSAR=98;FSRF=0;FSRR=0;FWDB=0.0243175;FXX=0;HRUN=1;LEN=1;MLLD=207.14;OALT=A;OID=.;OMAPALT=A;OPOS=11090916;OREF=C;PB=0.5;PBP=1;QD=38.2592;RBI=0.048476;REFB=0;REVB=0.0419355;RO=0;SAF=95;SAR=98;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=4.54468e-08;STB=0.5;STBP=1;TYPE=snp;VARB=3.87852e-05;ANN=MASP2    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:88:193:194:0:0:193:194:1:98:95:0:0:98:96:0:0 GOOD 194 reads
chr1    213068404    .    T    C    70.4374    PASS    AF=0.435897;AO=17;DP=39;FAO=17;FDP=39;FR=.;FRO=22;FSAF=16;FSAR=1;FSRF=11;FSRR=11;FWDB=-0.000611735;FXX=0;HRUN=1;LEN=1;MLLD=243.519;OALT=C;OID=.;OMAPALT=C;OPOS=213068404;OREF=T;PB=0.5;PBP=1;QD=7.22435;RBI=0.023516;REFB=0.00205571;REVB=0.023508;RO=22;SAF=16;SAR=1;SRF=11;SRR=11;SSEN=0;SSEP=0;SSSB=0.630793;STB=0.87614;STBP=0.001;TYPE=snp;VARB=-0.00167889;ANN=FLVCR1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:70:39:39:22:22:17:17:0.435897:1:16:11:11:1:16:11:11 STRAND BIAS 39 reads

desired output
Code:
##
##
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##
##
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
chr1    9324670    .    A    G    672.016    PASS    AF=0.528369;AO=148;DP=281;FAO=149;FDP=282;FR=.;FRO=133;FSAF=59;FSAR=90;FSRF=60;FSRR=73;FWDB=0.00343606;FXX=0;HRUN=1;LEN=1;MLLD=155.207;OALT=G;OID=.;OMAPALT=G;OPOS=9324670;OREF=A;PB=0.5;PBP=1;QD=9.53214;RBI=0.00594431;REFB=-0.0181827;REVB=0.00485061;RO=130;SAF=59;SAR=89;SRF=57;SRR=73;SSEN=0;SSEP=0;SSSB=-0.0352973;STB=0.526882;STBP=0.323;TYPE=snp;VARB=0.0184938;ANN=H6PD    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:527:281:282:130:133:148:149:0.528369:89:59:57:73:90:59:60:73 GOOD 282 reads het
chr1    10318652    .    C    G    360.217    PASS    AF=0.566929;AO=72;DP=129;FAO=72;FDP=127;FR=.;FRO=55;FSAF=36;FSAR=36;FSRF=31;FSRR=24;FWDB=0.00760676;FXX=0.0155027;HRUN=2;LEN=1;MLLD=115.62;OALT=G;OID=.;OMAPALT=G;OPOS=10318652;OREF=C;PB=0.5;PBP=1;QD=11.3454;RBI=0.0125905;REFB=-0.0312889;REVB=-0.0100329;RO=55;SAF=36;SAR=36;SRF=31;SRR=24;SSEN=0;SSEP=0;SSSB=-0.0505108;STB=0.527551;STBP=0.492;TYPE=snp;VARB=0.0181889;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:203:129:127:55:55:72:72:0.566929:36:36:31:24:36:36:31:24 GOOD 127 reads hom
chr1    10355834    .    C    T    504.995    PASS    AF=0.456747;AO=132;DP=290;FAO=132;FDP=289;FR=.;FRO=157;FSAF=56;FSAR=76;FSRF=85;FSRR=72;FWDB=0.0634576;FXX=0.00344816;HRUN=2;LEN=1;MLLD=58.7971;OALT=T;OID=.;OMAPALT=T;OPOS=10355834;OREF=C;PB=0.5;PBP=1;QD=6.98956;RBI=0.0644815;REFB=0.00590624;REVB=-0.0114455;RO=156;SAF=56;SAR=76;SRF=85;SRR=71;SSEN=0;SSEP=0;SSSB=-0.118565;STB=0.563872;STBP=0.047;TYPE=snp;VARB=-0.00678859;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:504:290:289:156:157:132:132:0.456747:76:56:85:71:76:56:85:72 GOOD 289 reads het
chr1    11090916    .    C    A    1855.57    PASS    AF=1;AO=193;DP=193;FAO=194;FDP=194;FR=.;FRO=0;FSAF=96;FSAR=98;FSRF=0;FSRR=0;FWDB=0.0243175;FXX=0;HRUN=1;LEN=1;MLLD=207.14;OALT=A;OID=.;OMAPALT=A;OPOS=11090916;OREF=C;PB=0.5;PBP=1;QD=38.2592;RBI=0.048476;REFB=0;REVB=0.0419355;RO=0;SAF=95;SAR=98;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=4.54468e-08;STB=0.5;STBP=1;TYPE=snp;VARB=3.87852e-05;ANN=MASP2    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:88:193:194:0:0:193:194:1:98:95:0:0:98:96:0:0 GOOD 194 reads hom
chr1    213068404    .    T    C    70.4374    PASS    AF=0.435897;AO=17;DP=39;FAO=17;FDP=39;FR=.;FRO=22;FSAF=16;FSAR=1;FSRF=11;FSRR=11;FWDB=-0.000611735;FXX=0;HRUN=1;LEN=1;MLLD=243.519;OALT=C;OID=.;OMAPALT=C;OPOS=213068404;OREF=T;PB=0.5;PBP=1;QD=7.22435;RBI=0.023516;REFB=0.00205571;REVB=0.023508;RO=22;SAF=16;SAR=1;SRF=11;SRR=11;SSEN=0;SSEP=0;SSSB=0.630793;STB=0.87614;STBP=0.001;TYPE=snp;VARB=-0.00167889;ANN=FLVCR1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:70:39:39:22:22:17:17:0.435897:1:16:11:11:1:16:11:11 STRAND BIAS 39 reads het

perl
Code:
perl -ple '/^[^#].*FDP=(\d+);.*STB=(\d+\.\d+);.*GT=(\d+);  ($9:*)= 0/1?="het",= 1/1?="hom"/ and $_.=($2 >= 0.8?" STRAND BIAS ":" GOOD ").$1." reads"' input > results.txt

# 2  
Old 05-27-2016
Quote:
The values should always be 0/1 or 1/2 and are in bold in the input . 0/1=het and 1/2=hom
You said: "1/2 equals hom", however, in your desired output 1/1 gets hom, and 1/2 is nowhere to be found.

Here's a line of Perl that produces what you posted as desired output, based on what you posted as input:

Code:
perl -ple 'BEGIN{%h=qw(0/1 het 1/1 hom)}; /([01]\/1)/ and $_ .= " $h{$1}"'  input > results.txt

This User Gave Thanks to Aia For This Post:
# 3  
Old 05-27-2016
I apologize for the typo.... it may be easier to include all 0/0 or 1/1 or 2/2 as hom. The other condition is 0/1 or 1/2 is het . I think these are all the possibilities. Thank you Smilie.
# 4  
Old 05-27-2016
Quote:
Originally Posted by cmccabe
I apologize for the typo.... it may be easier to include all 0/0 or 1/1 or 2/2 as hom. The other condition is 0/1 or 1/2 is het . I think these are all the possibilities. Thank you Smilie.
Modify it, accordingly.

Code:
perl -ple 'BEGIN{%h=qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom)} /([0-2]\/[0-2])/ and $_ .=" $h{$1}"'  input > results.txt

This User Gave Thanks to Aia For This Post:
# 5  
Old 05-27-2016
If you showed us a representative sample of your real input, you could also try using awk (or, on Solaris/SunOS systems, nawk or /usr/xpg4/bin/awk):
Code:
awk '$10~"^[012]"{$0=$0($10~"^(0/0|1/1|2/2)"?" hom":" het")}1' input > results.txt

This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 05-28-2016
Thank you very much Smilie
# 7  
Old 06-28-2016
The below perl from @Aia works great and produces the desired result, but I don't think I understand how it works. Is it possible to explain a little, maybe that will help. Thank you Smilie.

Code:
perl -ple 'BEGIN{%h=qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom)} /([0-2]\/[0-2])/ and $_ .=" $h{$1}"'

For example, how does STB get calculated or the reads extracted?

The input file is the one in the post. I have a follow-up question but I want to try and understand so I can make an attempt at a perl code. Thank you.

Followup question:
The below perl produces the current results. In the desired results the last fields below are added and need to be tab deliniated. The value in front of the string "score" is caculated from $6 or QUAL. That calculation is $6/33.

Code:
GOOD 282 reads het 20 score  (672.016/33) no decimal, rounded up
GOOD 127 reads hom 11 score  (360.217/33) no decimal, rounded up
GOOD 289 reads het 15 score  (504.995/33) no decimal, rounded up
GOOD 194 reads hom 56 score  (1855.57/33) no decimal, rounded up
STRAND BIAS 39 reads het 2 score  (70.4374/33) no decimal, rounded up


Code:
perl -ple 'BEGIN{%h=qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom)} /([0-2]\/[0-2])/ and $_ .=" $h{$1}"'  input > results.txt


input
Code:
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
chr1    9324670    .    A    G    672.016    PASS    AF=0.528369;AO=148;DP=281;FAO=149;FDP=282;FR=.;FRO=133;FSAF=59;FSAR=90;FSRF=60;FSRR=73;FWDB=0.00343606;FXX=0;HRUN=1;LEN=1;MLLD=155.207;OALT=G;OID=.;OMAPALT=G;OPOS=9324670;OREF=A;PB=0.5;PBP=1;QD=9.53214;RBI=0.00594431;REFB=-0.0181827;REVB=0.00485061;RO=130;SAF=59;SAR=89;SRF=57;SRR=73;SSEN=0;SSEP=0;SSSB=-0.0352973;STB=0.526882;STBP=0.323;TYPE=snp;VARB=0.0184938;ANN=H6PD    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:527:281:282:130:133:148:149:0.528369:89:59:57:73:90:59:60:73
chr1    10318652    .    C    G    360.217    PASS    AF=0.566929;AO=72;DP=129;FAO=72;FDP=127;FR=.;FRO=55;FSAF=36;FSAR=36;FSRF=31;FSRR=24;FWDB=0.00760676;FXX=0.0155027;HRUN=2;LEN=1;MLLD=115.62;OALT=G;OID=.;OMAPALT=G;OPOS=10318652;OREF=C;PB=0.5;PBP=1;QD=11.3454;RBI=0.0125905;REFB=-0.0312889;REVB=-0.0100329;RO=55;SAF=36;SAR=36;SRF=31;SRR=24;SSEN=0;SSEP=0;SSSB=-0.0505108;STB=0.527551;STBP=0.492;TYPE=snp;VARB=0.0181889;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:203:129:127:55:55:72:72:0.566929:36:36:31:24:36:36:31:24
chr1    10355834    .    C    T    504.995    PASS    AF=0.456747;AO=132;DP=290;FAO=132;FDP=289;FR=.;FRO=157;FSAF=56;FSAR=76;FSRF=85;FSRR=72;FWDB=0.0634576;FXX=0.00344816;HRUN=2;LEN=1;MLLD=58.7971;OALT=T;OID=.;OMAPALT=T;OPOS=10355834;OREF=C;PB=0.5;PBP=1;QD=6.98956;RBI=0.0644815;REFB=0.00590624;REVB=-0.0114455;RO=156;SAF=56;SAR=76;SRF=85;SRR=71;SSEN=0;SSEP=0;SSSB=-0.118565;STB=0.563872;STBP=0.047;TYPE=snp;VARB=-0.00678859;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:504:290:289:156:157:132:132:0.456747:76:56:85:71:76:56:85:72
chr1    11090916    .    C    A    1855.57    PASS    AF=1;AO=193;DP=193;FAO=194;FDP=194;FR=.;FRO=0;FSAF=96;FSAR=98;FSRF=0;FSRR=0;FWDB=0.0243175;FXX=0;HRUN=1;LEN=1;MLLD=207.14;OALT=A;OID=.;OMAPALT=A;OPOS=11090916;OREF=C;PB=0.5;PBP=1;QD=38.2592;RBI=0.048476;REFB=0;REVB=0.0419355;RO=0;SAF=95;SAR=98;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=4.54468e-08;STB=0.5;STBP=1;TYPE=snp;VARB=3.87852e-05;ANN=MASP2    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:88:193:194:0:0:193:194:1:98:95:0:0:98:96:0:0
chr1    213068404    .    T    C    70.4374    PASS    AF=0.435897;AO=17;DP=39;FAO=17;FDP=39;FR=.;FRO=22;FSAF=16;FSAR=1;FSRF=11;FSRR=11;FWDB=-0.000611735;FXX=0;HRUN=1;LEN=1;MLLD=243.519;OALT=C;OID=.;OMAPALT=C;OPOS=213068404;OREF=T;PB=0.5;PBP=1;QD=7.22435;RBI=0.023516;REFB=0.00205571;REVB=0.023508;RO=22;SAF=16;SAR=1;SRF=11;SRR=11;SSEN=0;SSEP=0;SSSB=0.630793;STB=0.87614;STBP=0.001;TYPE=snp;VARB=-0.00167889;ANN=FLVCR1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:70:39:39:22:22:17:17:0.435897:1:16:11:11:1:16:11:11


current results
Code:
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
chr1    9324670    .    A    G    672.016    PASS    AF=0.528369;AO=148;DP=281;FAO=149;FDP=282;FR=.;FRO=133;FSAF=59;FSAR=90;FSRF=60;FSRR=73;FWDB=0.00343606;FXX=0;HRUN=1;LEN=1;MLLD=155.207;OALT=G;OID=.;OMAPALT=G;OPOS=9324670;OREF=A;PB=0.5;PBP=1;QD=9.53214;RBI=0.00594431;REFB=-0.0181827;REVB=0.00485061;RO=130;SAF=59;SAR=89;SRF=57;SRR=73;SSEN=0;SSEP=0;SSSB=-0.0352973;STB=0.526882;STBP=0.323;TYPE=snp;VARB=0.0184938;ANN=H6PD    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:527:281:282:130:133:148:149:0.528369:89:59:57:73:90:59:60:73 GOOD 282 reads het
chr1    10318652    .    C    G    360.217    PASS    AF=0.566929;AO=72;DP=129;FAO=72;FDP=127;FR=.;FRO=55;FSAF=36;FSAR=36;FSRF=31;FSRR=24;FWDB=0.00760676;FXX=0.0155027;HRUN=2;LEN=1;MLLD=115.62;OALT=G;OID=.;OMAPALT=G;OPOS=10318652;OREF=C;PB=0.5;PBP=1;QD=11.3454;RBI=0.0125905;REFB=-0.0312889;REVB=-0.0100329;RO=55;SAF=36;SAR=36;SRF=31;SRR=24;SSEN=0;SSEP=0;SSSB=-0.0505108;STB=0.527551;STBP=0.492;TYPE=snp;VARB=0.0181889;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:203:129:127:55:55:72:72:0.566929:36:36:31:24:36:36:31:24 GOOD 127 reads hom
chr1    10355834    .    C    T    504.995    PASS    AF=0.456747;AO=132;DP=290;FAO=132;FDP=289;FR=.;FRO=157;FSAF=56;FSAR=76;FSRF=85;FSRR=72;FWDB=0.0634576;FXX=0.00344816;HRUN=2;LEN=1;MLLD=58.7971;OALT=T;OID=.;OMAPALT=T;OPOS=10355834;OREF=C;PB=0.5;PBP=1;QD=6.98956;RBI=0.0644815;REFB=0.00590624;REVB=-0.0114455;RO=156;SAF=56;SAR=76;SRF=85;SRR=71;SSEN=0;SSEP=0;SSSB=-0.118565;STB=0.563872;STBP=0.047;TYPE=snp;VARB=-0.00678859;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:504:290:289:156:157:132:132:0.456747:76:56:85:71:76:56:85:72 GOOD 289 reads het
chr1    11090916    .    C    A    1855.57    PASS    AF=1;AO=193;DP=193;FAO=194;FDP=194;FR=.;FRO=0;FSAF=96;FSAR=98;FSRF=0;FSRR=0;FWDB=0.0243175;FXX=0;HRUN=1;LEN=1;MLLD=207.14;OALT=A;OID=.;OMAPALT=A;OPOS=11090916;OREF=C;PB=0.5;PBP=1;QD=38.2592;RBI=0.048476;REFB=0;REVB=0.0419355;RO=0;SAF=95;SAR=98;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=4.54468e-08;STB=0.5;STBP=1;TYPE=snp;VARB=3.87852e-05;ANN=MASP2    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:88:193:194:0:0:193:194:1:98:95:0:0:98:96:0:0 GOOD 194 reads hom
chr1    213068404    .    T    C    70.4374    PASS    AF=0.435897;AO=17;DP=39;FAO=17;FDP=39;FR=.;FRO=22;FSAF=16;FSAR=1;FSRF=11;FSRR=11;FWDB=-0.000611735;FXX=0;HRUN=1;LEN=1;MLLD=243.519;OALT=C;OID=.;OMAPALT=C;OPOS=213068404;OREF=T;PB=0.5;PBP=1;QD=7.22435;RBI=0.023516;REFB=0.00205571;REVB=0.023508;RO=22;SAF=16;SAR=1;SRF=11;SRR=11;SSEN=0;SSEP=0;SSSB=0.630793;STB=0.87614;STBP=0.001;TYPE=snp;VARB=-0.00167889;ANN=FLVCR1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:70:39:39:22:22:17:17:0.435897:1:16:11:11:1:16:11:11 STRAND BIAS 39 reads het


desired results
Code:
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality, the Phred-scaled marginal (or unconditional) probability of the called genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    SAMPLE
chr1    9324670    .    A    G    672.016    PASS    AF=0.528369;AO=148;DP=281;FAO=149;FDP=282;FR=.;FRO=133;FSAF=59;FSAR=90;FSRF=60;FSRR=73;FWDB=0.00343606;FXX=0;HRUN=1;LEN=1;MLLD=155.207;OALT=G;OID=.;OMAPALT=G;OPOS=9324670;OREF=A;PB=0.5;PBP=1;QD=9.53214;RBI=0.00594431;REFB=-0.0181827;REVB=0.00485061;RO=130;SAF=59;SAR=89;SRF=57;SRR=73;SSEN=0;SSEP=0;SSSB=-0.0352973;STB=0.526882;STBP=0.323;TYPE=snp;VARB=0.0184938;ANN=H6PD    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:527:281:282:130:133:148:149:0.528369:89:59:57:73:90:59:60:73 GOOD 282 reads het 20 score
chr1    10318652    .    C    G    360.217    PASS    AF=0.566929;AO=72;DP=129;FAO=72;FDP=127;FR=.;FRO=55;FSAF=36;FSAR=36;FSRF=31;FSRR=24;FWDB=0.00760676;FXX=0.0155027;HRUN=2;LEN=1;MLLD=115.62;OALT=G;OID=.;OMAPALT=G;OPOS=10318652;OREF=C;PB=0.5;PBP=1;QD=11.3454;RBI=0.0125905;REFB=-0.0312889;REVB=-0.0100329;RO=55;SAF=36;SAR=36;SRF=31;SRR=24;SSEN=0;SSEP=0;SSSB=-0.0505108;STB=0.527551;STBP=0.492;TYPE=snp;VARB=0.0181889;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:203:129:127:55:55:72:72:0.566929:36:36:31:24:36:36:31:24 GOOD 127 reads hom 11 score
chr1    10355834    .    C    T    504.995    PASS    AF=0.456747;AO=132;DP=290;FAO=132;FDP=289;FR=.;FRO=157;FSAF=56;FSAR=76;FSRF=85;FSRR=72;FWDB=0.0634576;FXX=0.00344816;HRUN=2;LEN=1;MLLD=58.7971;OALT=T;OID=.;OMAPALT=T;OPOS=10355834;OREF=C;PB=0.5;PBP=1;QD=6.98956;RBI=0.0644815;REFB=0.00590624;REVB=-0.0114455;RO=156;SAF=56;SAR=76;SRF=85;SRR=71;SSEN=0;SSEP=0;SSSB=-0.118565;STB=0.563872;STBP=0.047;TYPE=snp;VARB=-0.00678859;ANN=KIF1B    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:504:290:289:156:157:132:132:0.456747:76:56:85:71:76:56:85:72 GOOD 289 reads het 15 score
chr1    11090916    .    C    A    1855.57    PASS    AF=1;AO=193;DP=193;FAO=194;FDP=194;FR=.;FRO=0;FSAF=96;FSAR=98;FSRF=0;FSRR=0;FWDB=0.0243175;FXX=0;HRUN=1;LEN=1;MLLD=207.14;OALT=A;OID=.;OMAPALT=A;OPOS=11090916;OREF=C;PB=0.5;PBP=1;QD=38.2592;RBI=0.048476;REFB=0;REVB=0.0419355;RO=0;SAF=95;SAR=98;SRF=0;SRR=0;SSEN=0;SSEP=0;SSSB=4.54468e-08;STB=0.5;STBP=1;TYPE=snp;VARB=3.87852e-05;ANN=MASP2    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    1/1:88:193:194:0:0:193:194:1:98:95:0:0:98:96:0:0 GOOD 194 reads hom 56 score
chr1    213068404    .    T    C    70.4374    PASS    AF=0.435897;AO=17;DP=39;FAO=17;FDP=39;FR=.;FRO=22;FSAF=16;FSAR=1;FSRF=11;FSRR=11;FWDB=-0.000611735;FXX=0;HRUN=1;LEN=1;MLLD=243.519;OALT=C;OID=.;OMAPALT=C;OPOS=213068404;OREF=T;PB=0.5;PBP=1;QD=7.22435;RBI=0.023516;REFB=0.00205571;REVB=0.023508;RO=22;SAF=16;SAR=1;SRF=11;SRR=11;SSEN=0;SSEP=0;SSSB=0.630793;STB=0.87614;STBP=0.001;TYPE=snp;VARB=-0.00167889;ANN=FLVCR1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:70:39:39:22:22:17:17:0.435897:1:16:11:11:1:16:11:11 STRAND BIAS 39 reads het 2 score


Last edited by cmccabe; 06-28-2016 at 03:50 PM.. Reason: added edit and followup ?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Update a specific field in file with Variable value based on other Key Word

I have an input file with A=xyz B=pqr I would want the value in Second Field (xyz or pqr) updated with a value present in Shell Variable based on the value passed in the first field. (A or B ) while read line do NEW_VALUE = `some functionality done on $line` If $line=First Field-... (1 Reply)
Discussion started by: infernalhell
1 Replies

2. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

Perl to update field based on a specific set of rules

In the perl below, which does execute, I am having trouble with the else in Rule 3. The digit in f{8} is extracted and used to update f accordinly along with the value in f. There can be either - * or + before the number that is extracted but the same logic applies, that is if the value is greater... (5 Replies)
Discussion started by: cmccabe
5 Replies

4. Shell Programming and Scripting

File Parsing based on a character in a specific field

Hi All, I'm having a hard time finding a starting point for my issue. I have a 30k line file (fspsec.txt) that I would like to parse into smaller files based on any character existing in field 1. ACCOUNTANT LEVEL 1 (ACCT.ACCOUNTANT) OPERATORS: DOEJO (418) TOOLS: Branch Maintenance ... (2 Replies)
Discussion started by: aahlrich
2 Replies

5. Shell Programming and Scripting

Replace and add line in file with line in another file based on matching string

Hi, I want to achieve something similar to what described in another post: The difference is I want to add the line if the pattern is not found. File 1: A123, valueA, valueB B234, valueA, valueB C345, valueA, valueB D456, valueA, valueB E567, valueA, valueB F678, valueA, valueB ... (11 Replies)
Discussion started by: jyu3
11 Replies

6. Shell Programming and Scripting

Combine multiple lines in file based on specific field

Hi, I have an issue to combine multiple lines of a file. I have records as below. Fields are delimited by TAB. Each lines are ending with a new line char (\n) Input -------- ABC 123456 abcde 987 890456 7890 xyz ght gtuv ABC 5tyin 1234 789 ghty kuio ABC ghty jind 1234 678 ght ... (8 Replies)
Discussion started by: ratheesh2011
8 Replies

7. Shell Programming and Scripting

Replace specific field on specific line sed or awk

I'm trying to update a text file via sed/awk, after a lot of searching I still can't find a code snippet that I can get to work. Brief overview: I have user input a line to a variable, I then find a specific value in this line 10th field in this case. After asking for new input and doing some... (14 Replies)
Discussion started by: crownedzero
14 Replies

8. Shell Programming and Scripting

Using awk to read a specific line and a specific field on that line.

Say the input was as follows: Brat 20 x 1000 32rf Pour 15 p 1621 05pr Dart 10 z 1111 22xx My program prompts for an input, what I want is to use the input to locate a specific field. Like if I type in, "Pou" then it would return "Pour" and just "Pour" I currently have this line but it is... (6 Replies)
Discussion started by: Bungkai
6 Replies

9. Shell Programming and Scripting

Deleting a line from a file based on one specific string instance?

Hello! I need to delete one line in a file which matches one very precise instance of a string only. When searching the forum I unfortunately only found a solution which would delete each line on which a particular string occurs. Let's assume I have a file composed of thousands of lines... (4 Replies)
Discussion started by: Black Sun
4 Replies

10. Shell Programming and Scripting

using sed to replace a specific string on a specific line number using variables

using sed to replace a specific string on a specific line number using variables this is where i am at grep -v WARNING output | grep -v spawn | grep -v Passphrase | grep -v Authentication | grep -v '/sbin/tfadmin netguard -C'| grep -v 'NETWORK>' >> output.clean grep -n Destination... (2 Replies)
Discussion started by: todd.cutting
2 Replies
Login or Register to Ask a Question