Perl to extract information from a file line by line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl to extract information from a file line by line
# 1  
Old 09-06-2016
Perl to extract information from a file line by line

In the below perl code I am using tags within each line to extract certain information. The tags that are used are:
STB >0.8 is STRAND BIAS otherwise GOOD
FDP is the second number
GO towards the end of the line is read into an array and the value returned is outputed, in the first line that value is 2/2 or hom
column6 is then dived by 33 to give the last number (in line one it is 455.945/33 or 13

The current output is perfect for most cases however, not all of them. In the line1 that did not work STB has two values in it seperated by a ,. That is not the case in most lines but I can not seem to get the desired output. Thank you Smilie.

input
Code:
chr9    140053158    .    TGGGGGC    TGGGG,TGGGGC    455.945    PASS    AF=0,1;AO=21,60;DP=126;FAO=0,124;FDP=124;FR=.;FRO=0;FSAF=0,80;FSAR=0,44;FSRF=0;FSRR=0;FWDB=0.0104498,0.231579;FXX=0.0158718;HRUN=5,5;LEN=2,1;MLLD=9.50728,12.1416;OALT=-,-;OID=.,.;OMAPALT=TGGGG,TGGGGC;OPOS=140053163,140053159;OREF=GC,G;PB=0.5,0.5;PBP=1,1;QD=14.7079;RBI=0.0306624,0.23266;REFB=-0.0212019,-0.024401;REVB=0.0288268,0.0223937;RO=25;SAF=21,41;SAR=0,19;SRF=17;SRR=8;SSEN=0.326531,0.326531;SSEP=0,0;SSSB=0.814959,0.00209983;STB=0.5,0.5;STBP=1,1;TYPE=del,del;VARB=-0.0378589,0.00692405;ANN=GRIN1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    2/2:81:126:124:25:0:21,60:0,124:0,1:0,19:21,41:17:8:0,44:0,80:0:0
chr1    949597    .    C    T    629.899    PASS    AF=0.4375;AO=513;DP=1095;FAO=175;FDP=400;FR=.;FRO=225;FSAF=77;FSAR=98;FSRF=118;FSRR=107;FWDB=-0.00642053;FXX=0;HRUN=1;LEN=1;MLLD=188.973;OALT=T;OID=.;OMAPALT=T;OPOS=949597;OREF=C;PB=0.5;PBP=1;QD=6.29899;RBI=0.0194428;REFB=0.00779203;REVB=-0.0183521;RO=579;SAF=226;SAR=287;SRF=306;SRR=273;SSEN=0;SSEP=0;SSSB=-0.0851561;STB=0.547637;STBP=0.084;TYPE=snp;VARB=-0.010388;ANN=ISG15    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:629:1095:400:579:225:513:175:0.4375:287:226:306:273:98:77:118:107
chr1    949654    .    A    G    765.255    PASS    AF=0.4775;AO=496;DP=1115;FAO=191;FDP=400;FR=.;FRO=209;FSAF=80;FSAR=111;FSRF=101;FSRR=108;FWDB=-0.00182381;FXX=0;HRUN=1;LEN=1;MLLD=130.022;OALT=G;OID=.;OMAPALT=G;OPOS=949654;OREF=A;PB=0.5;PBP=1;QD=7.65255;RBI=0.0126329;REFB=-0.00552621;REVB=-0.0125005;RO=617;SAF=242;SAR=254;SRF=309;SRR=308;SSEN=0;SSEP=0;SSSB=-0.0129692;STB=0.534175;STBP=0.184;TYPE=snp;VARB=0.00480316;ANN=ISG15    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:765:1115:400:617:209:496:191:0.4775:254:242:309:308:111:80:101:108

current output
Code:
chr9    140053158    .    TGGGGGC    TGGGG,TGGGGC    455.945    PASS    AF=0,1;AO=21,60;DP=126;FAO=0,124;FDP=124;FR=.;FRO=0;FSAF=0,80;FSAR=0,44;FSRF=0;FSRR=0;FWDB=0.0104498,0.231579;FXX=0.0158718;HRUN=5,5;LEN=2,1;MLLD=9.50728,12.1416;OALT=-,-;OID=.,.;OMAPALT=TGGGG,TGGGGC;OPOS=140053163,140053159;OREF=GC,G;PB=0.5,0.5;PBP=1,1;QD=14.7079;RBI=0.0306624,0.23266;REFB=-0.0212019,-0.024401;REVB=0.0288268,0.0223937;RO=25;SAF=21,41;SAR=0,19;SRF=17;SRR=8;SSEN=0.326531,0.326531;SSEP=0,0;SSSB=0.814959,0.00209983;STB=0.5,0.5;STBP=1,1;TYPE=del,del;VARB=-0.0378589,0.00692405;ANN=GRIN1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    2/2:81:126:124:25:0:21,60:0,124:0,1:0,19:21,41:17:8:0,44:0,80:0:0
chr1    949597    .    C    T    629.899    PASS    AF=0.4375;AO=513;DP=1095;FAO=175;FDP=400;FR=.;FRO=225;FSAF=77;FSAR=98;FSRF=118;FSRR=107;FWDB=-0.00642053;FXX=0;HRUN=1;LEN=1;MLLD=188.973;OALT=T;OID=.;OMAPALT=T;OPOS=949597;OREF=C;PB=0.5;PBP=1;QD=6.29899;RBI=0.0194428;REFB=0.00779203;REVB=-0.0183521;RO=579;SAF=226;SAR=287;SRF=306;SRR=273;SSEN=0;SSEP=0;SSSB=-0.0851561;STB=0.547637;STBP=0.084;TYPE=snp;VARB=-0.010388;ANN=ISG15    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:629:1095:400:579:225:513:175:0.4375:287:226:306:273:98:77:118:107    GOOD    400    het    19
chr1    949654    .    A    G    765.255    PASS    AF=0.4775;AO=496;DP=1115;FAO=191;FDP=400;FR=.;FRO=209;FSAF=80;FSAR=111;FSRF=101;FSRR=108;FWDB=-0.00182381;FXX=0;HRUN=1;LEN=1;MLLD=130.022;OALT=G;OID=.;OMAPALT=G;OPOS=949654;OREF=A;PB=0.5;PBP=1;QD=7.65255;RBI=0.0126329;REFB=-0.00552621;REVB=-0.0125005;RO=617;SAF=242;SAR=254;SRF=309;SRR=308;SSEN=0;SSEP=0;SSSB=-0.0129692;STB=0.534175;STBP=0.184;TYPE=snp;VARB=0.00480316;ANN=ISG15    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:765:1115:400:617:209:496:191:0.4775:254:242:309:308:111:80:101:108    GOOD    400    het    23

desired output
Code:
chr9    140053158    .    TGGGGGC    TGGGG,TGGGGC    455.945    PASS    AF=0,1;AO=21,60;DP=126;FAO=0,124;FDP=124;FR=.;FRO=0;FSAF=0,80;FSAR=0,44;FSRF=0;FSRR=0;FWDB=0.0104498,0.231579;FXX=0.0158718;HRUN=5,5;LEN=2,1;MLLD=9.50728,12.1416;OALT=-,-;OID=.,.;OMAPALT=TGGGG,TGGGGC;OPOS=140053163,140053159;OREF=GC,G;PB=0.5,0.5;PBP=1,1;QD=14.7079;RBI=0.0306624,0.23266;REFB=-0.0212019,-0.024401;REVB=0.0288268,0.0223937;RO=25;SAF=21,41;SAR=0,19;SRF=17;SRR=8;SSEN=0.326531,0.326531;SSEP=0,0;SSSB=0.814959,0.00209983;STB=0.5,0.5;STBP=1,1;TYPE=del,del;VARB=-0.0378589,0.00692405;ANN=GRIN1    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    2/2:81:126:124:25:0:21,60:0,124:0,1:0,19:21,41:17:8:0,44:0,80:0:0
chr1    949597    .    C    T    629.899    PASS    AF=0.4375;
GOOD     127     hom     13
AO=513;DP=1095;FAO=175;FDP=400;FR=.;FRO=225;FSAF=77;FSAR=98;FSRF=118;FSRR=107;FWDB=-0.00642053;FXX=0;HRUN=1;LEN=1;MLLD=188.973;OALT=T;OID=.;OMAPALT=T;OPOS=949597;OREF=C;PB=0.5;PBP=1;QD=6.29899;RBI=0.0194428;REFB=0.00779203;REVB=-0.0183521;RO=579;SAF=226;SAR=287;SRF=306;SRR=273;SSEN=0;SSEP=0;SSSB=-0.0851561;STB=0.547637;STBP=0.084;TYPE=snp;VARB=-0.010388;ANN=ISG15    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:629:1095:400:579:225:513:175:0.4375:287:226:306:273:98:77:118:107    GOOD    400    het    19
chr1    949654    .    A    G    765.255    PASS    AF=0.4775;AO=496;DP=1115;FAO=191;FDP=400;FR=.;FRO=209;FSAF=80;FSAR=111;FSRF=101;FSRR=108;FWDB=-0.00182381;FXX=0;HRUN=1;LEN=1;MLLD=130.022;OALT=G;OID=.;OMAPALT=G;OPOS=949654;OREF=A;PB=0.5;PBP=1;QD=7.65255;RBI=0.0126329;REFB=-0.00552621;REVB=-0.0125005;RO=617;SAF=242;SAR=254;SRF=309;SRR=308;SSEN=0;SSEP=0;SSSB=-0.0129692;STB=0.534175;STBP=0.184;TYPE=snp;VARB=0.00480316;ANN=ISG15    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR    0/1:765:1115:400:617:209:496:191:0.4775:254:242:309:308:111:80:101:108    GOOD    400    het    23

perl
Code:
perl -plae '
    BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) }
    /^[^#].*FDP=(\d+);.*STB=(\d+\.\d+);.*([0-2]\/[0-2])/ and
    $_ .= join "\t", ("", ($2 >= 0.8 ? "STRAND BIAS" : "GOOD"), $1, $h{$3}, int($F[5]/33+0.5))' input


Last edited by cmccabe; 09-06-2016 at 12:12 PM.. Reason: fixed format
# 2  
Old 09-10-2016
In case this helps:
I just needed [,:] for an either "," or ";" match. Thank you Smilie.

Code:
 perl -plae '
     BEGIN{ %h = qw(0/0 hom 0/1 het 1/1 hom 1/2 het 2/2 hom) }
     /^[^#].*FDP=(\d+);.*STB=(\d+\.\d+)[,;].*([0-2]\/[0-2])/ and
     $_ .= join "\t", ("", ($2 >= 0.8 ? "STRAND BIAS" : "GOOD"), $1, $h{$3}, int($F[5]/33+0.5))' input


Last edited by cmccabe; 09-10-2016 at 10:30 AM.. Reason: fixed format
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl command line option '-n','-p' and multiple files: can it know a file name of a printed line?

I am looking for help in processing of those options: '-n' or '-p' I understand what they do and how to use them. But, I would like to use them with more than one file (and without any shell-loop; loading the 'perl' once.) I did try it and -n works on 2 files. Question is: - is it possible to... (6 Replies)
Discussion started by: alex_5161
6 Replies

2. Shell Programming and Scripting

HELP: Shell Script to read a Log file line by line and extract Info based on KEYWORDS matching

I have a LOG file which looks like this Import started at: Mon Jul 23 02:13:01 EDT 2012 Initialization completed in 2.146 seconds. -------------------------------------------------------------------------------- -- Import summary for Import item: PolicyInformation... (8 Replies)
Discussion started by: biztank
8 Replies

3. Shell Programming and Scripting

PERL or SHELL Scrript to search in Directories by taking line by line from a text file

Unix box server version *********** >uname -r B.11.00 >echo $SHELL /usr/bin/ksh --> in this server, I have the path like /IMbuild/dev/im0serv1 ---> in that directory I have the folders startup(.jsp files nearly 100 jsp's ) and scripts(contains .js files nearly 100 files) ... (9 Replies)
Discussion started by: pasam
9 Replies

4. Shell Programming and Scripting

Extract first column from second line in perl

Hello Gurus I have a source file which has the first line as header and the rest are the records I need to extract the first column from the second line to extract a value I/P ... (7 Replies)
Discussion started by: Pratik4891
7 Replies

5. Shell Programming and Scripting

extract a line from a file by line number

Hi guys, does anyone know how to extract(grep) a line from the file, if I know the line number? Thanks a lot. (9 Replies)
Discussion started by: aoussenko
9 Replies

6. Shell Programming and Scripting

get the fifth line of a text file into a shell script and trim the line to extract a WORD

FOLKS , i have a text file that is generated automatically of an another korn shell script, i want to bring in the fifth line of the text file in to my korn shell script and look for a particular word in the line . Can you all share some thoughts on this one. thanks... Venu (3 Replies)
Discussion started by: venu
3 Replies

7. Shell Programming and Scripting

extract line by line from file

hi dudes, I have a text file in the below format 1 s sanity /u02 2 r script1 /u02 3 s sanity /u02 Please tell me a script to read this file line by line, I wrote the below script , but it is printing only 1st line not printing rest... (7 Replies)
Discussion started by: shirdi
7 Replies

8. Shell Programming and Scripting

Perl REGEX - How do extract a string in a line?

Hi Guys, In the following line: cn=portal.090710.191533.428571000,cn=groups,dc=mp,dc=rj,dc=gov,dc=br I need to extract this string: portal.090710.191533.428571000 As you can see this string always will be bettween "cn=" and "," strings. Someone know one regular expression to... (4 Replies)
Discussion started by: maverick-ski
4 Replies

9. Shell Programming and Scripting

extract a line from a file using the line number

Hello, I am having trouble extracting a specific line from a file when the line number is known. My first attempt involved grep -n 'hi' (the word 'hi will always be there) to get the line number before the line that I actually want (line 4). Extra Notes: -I am working in a bash script. -The... (7 Replies)
Discussion started by: grandtheftander
7 Replies

10. Shell Programming and Scripting

Extract a line from a file using the line number

I have a shell script and want to assign a value to a variable. The value is the line exctrated from a file using the line number. The line number it is not fix, and could change any time. I have tried sed, awk, head .. See my script # Get randome line number from the file #selectedline = `awk... (1 Reply)
Discussion started by: zambo
1 Replies
Login or Register to Ask a Question