Add static text in perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Add static text in perl
# 15  
Old 02-08-2016
I apologize, I put the wrong output file for the input previously posted.

input
Code:
Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	ExonicFunc.refGene	AAChange.refGene	PopFreqMax	1000G2012APR_ALL	1000G2012APR_AFR	1000G2012APR_AMR	1000G2012APR_ASN	1000G2012APR_EUR	ESP6500si_ALL	ESP6500si_AA	ESP6500si_EA	CG46	common	clinvar	clinvarsubmit	clinvarreference
4	41748130	41748130	G	C	exonic	PHOX2B		synonymous SNV	PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G	0.0007	.	.	.	.	.	0.0005	0.0002	0.0007	.

output
Code:
Index	Chromosome Position	Gene	Inheritance	RNA Accession	Chr	Coverage	Score	A(#F,#R)	C(#F,#R)	G(#F,#R)	T(#F,#R)	Ins(#F,#R)	Del(#F,#R)	SNP db_xref	Mutation Call	Mutant Allele Frequency	Amino Acid Change	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	ExonicFunc.refGene	AAChange.refGene	PopFreqMax	1000G2012APR_ALL	1000G2012APR_AFR	1000G2012APR_AMR	1000G2012APR_ASN	1000G2012APR_EUR	ESP6500si_ALL	ESP6500si_AA	ESP6500si_EA	CG46	common	clinvar	clinvarsubmit	clinvarreference	HP	SPLICE	Pseudogene	Classification	HGMD	Disease	Sanger	References
2	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	4	41748130	41748130	G	C	exonic	PHOX2B		synonymous SNV	PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G	0.0007	.	.	.	.	.	0.0005	0.0002	0.0007	.					Null	Null	Null	Null	Null	Null	Null	Null

field headers where info comes from
Code:
1: 1                   (Index)
2: Null                (Chromosome)
3: PHOX2B          (Gene)
4: AD                 (Inheritence)
5: NM_003924.3   (RNA Accession)
6: Null                (Chr)
7: Null                (Coverage)
8: Null                (Score)
9: Null                (A(#F,#R)
10: Null              (C(#F,#R)
11: Null              (G(#F,#R)
12: Null              (T(#F,#R)
13: Null              (Ins(#F,#R)
14: Null              (Del(#F,#R)
15: Null              (SNP db_xref)
16: c.C639G        (Mutation Call)
17: Null              (Mutant Allele Frequency)
18: G213G          (Amino Acid Change)
19: 4                 (Chr)
20: 41748130      (Start)
21: 41748130      (Stop)
22: G                 (Ref)
23: C                 (Alt)
24: exonic          (Func.refGene)
25: PHOX2B        (Gene.refGene)
26:                    (GeneDetail.refGene)
27: synonymous   (ExonicFunc.refGene)
28: PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G (AAChange.refGene) - used for the split to get values in 3,4,5,16, and 18) - this  split uses the @nms to only use the record in this field that starts with the same NM_ as the @nms (this field can have multiple records in it, so to ensure I get the correct one I use @nms and only return that matching value)

29:    (PopFreqMax)
30:    (1000G2012APR_ALL)
31:    (1000G2012APR_AFR)
32:    (1000G2012APR_AMR)
33:    (1000G2012APR_ASN)
34:    (1000G2012APR_EUR)
35:    (ESP6500si_ALL)
36:    (ESP6500si_AA)
37:    (ESP6500si_EA)
38:    (CG46)
39:    (common)
40:    (clinvar)
41:    (clinvarsubmit)
42:    (clinvarreference)
43: Null   (HP)
44: Null   (Splice)
45: Null   (Pseudogene)
46: VUS   (Classification) - currently not showing up (Null is)
47: Null   (HGMD)
48: Null   (Disease)
49: Null   (Sanger)
50: Null   (References)

Quote:
Also, please, explain the extra tabs in your output file, every ^I identify a tab in the line.
I did not mean nor do I know why the extra tabs are there.

Quote:
$vals[9] contains PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G according to your input. It can not be split by commas.
Can you explain that? Are there any lines that would have something like:
the perl that is used to populate this column only allows the format with : in it, so commas should not show up.

Thank you for all your help Smilie.
# 16  
Old 02-08-2016
Please, give it a try.
You can modify at your content.

Code:
#!/usr/bin/env perl
# reformat.pl
use strict;
use warnings;

my %nms = (
    "NM_004004.5" => "AR",
    "NM_004992.3" => "XLD",
    "NM_003924.3" => "AD",
);

my $readf = shift || die "Missing input file: $!\n";
my $writef = shift || die "Missing output file: $!\n";

my @header = (
    "Index",
    "Chromosome Position",
    "Gene",
    "Inheritance",
    "RNA Accession",
    "Chr",
    "Coverage",
    "Score",
    "A(#F,#R)",
    "C(#F,#R)",
    "G(#F,#R)",
    "T(#F,#R)",
    "Ins(#F,#R)",
    "Del(#F,#R)",
    "SNP db_xref",
    "Mutation Call",
    "Mutant Allele Frequency",
    "Amino Acid Change",
    "HP",
    "SPLICE",
    "Pseudogene",
    "Classification",
    "HGMD",
    "Disease",
    "Sanger",
    "References",
);

open my $in, '<', $readf or die "Cannot open $readf: $!\n";
open my $out, '>', $writef or die "Cannot create $writef: $!\n";

my $add2header;
chomp( $add2header = <$in> );
splice @header, 18, 0, $add2header;
save(@header);
$.= 0; # reset lines count to remove header
while( <$in> ) {
    chomp;
    my @ruler = (("Null")x17, ("")x25, ("Null")x8);
    my @fields = split "\t";
    my $len = @fields;
    splice @ruler, 17, $len, @fields;
    my ($gene, $transcript, $exon, $coding, $aa) = split ":", $fields[9];
    $ruler[0] = $.;
    $ruler[2] = $gene;
    $ruler[3] = $nms{$transcript};
    $ruler[4] = $transcript;
    $ruler[15] = $coding;
    $ruler[17] = $aa;
    $ruler[45] = "VUS";
    save(@ruler);
}

sub save {
    local $" = "\t";
    print $out "@_\n";
}

close $in;
close $out;


Last edited by Aia; 02-08-2016 at 11:24 PM.. Reason: Add reset lines for index to one.
# 17  
Old 02-13-2016
I apologize for the delay and just got to test the perl using the input from post 15. The results look the same as before with VUS appearing after the "Null" values:
Thank you for all you help Smilie.

Code:
Index	Chromosome Position	Gene	Inheritance	RNA Accession	Chr	Coverage	Score	A(#F,#R)	C(#F,#R)	G(#F,#R)	T(#F,#R)	Ins(#F,#R)	Del(#F,#R)	SNP db_xref	Mutation Call	Mutant Allele Frequency	Amino Acid Change	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	ExonicFunc.refGene	AAChange.refGene	PopFreqMax	1000G2012APR_ALL	1000G2012APR_AFR	1000G2012APR_AMR	1000G2012APR_ASN	1000G2012APR_EUR	ESP6500si_ALL	ESP6500si_AA	ESP6500si_EA	CG46	common	clinvar	clinvarsubmit	clinvarreference	HP	SPLICE	Pseudogene	Classification	HGMD	Disease	Sanger	References
2	Null	PHOX2B	AD	NM_003924.3	Null	Null	Null	Null	Null	Null	Null	Null	Null	Null	c.C639G	Null	p.G213G	4	41748130	41748130	G	C	exonic	PHOX2B		synonymous SNV	PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G	0.0007	.	.	.	.	.	0.0005	0.0002	0.0007	.					Null	Null	Null	Null	Null	Null	Null	Null																																						VUS


Last edited by cmccabe; 02-13-2016 at 12:46 PM.. Reason: added input location
# 18  
Old 02-13-2016
This is the result I get when I run the code I posted in #16 against the input you posted in #15.
Code:
Index   Chromosome Position Gene    Inheritance RNA Accession   Chr Coverage    Score   A(#F,#R)    C(#F,#R)    G(#F,#R)    T(#F,#R)    Ins(#F,#R)  Del(#F,#R)  SNP db_xref Mutation Call   Mutant Allele Frequency Amino Acid Change   Chr Start   End Ref Alt Func.refGene    Gene.refGene    GeneDetail.refGene  ExonicFunc.refGene  AAChange.refGene    PopFreqMax  1000G2012APR_ALL    1000G2012APR_AFR    1000G2012APR_AMR    1000G2012APR_ASN    1000G2012APR_EUR    ESP6500si_ALL   ESP6500si_AA    ESP6500si_EA    CG46    common  clinvar clinvarsubmit   clinvarreference    HP  SPLICE  Pseudogene  Classification  HGMD    Disease Sanger  References
1   Null    PHOX2B  AD  NM_003924.3 Null    Null    Null    Null    Null    Null    Null    Null    Null    Null    c.C639G Null    p.G213G 41748130    41748130    G   C   exonic  PHOX2B      synonymous SNV  PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G    0.0007  .   .   .   .   .   0.0005  0.0002  0.0007  .                       Null    Null    Null    VUS Null    Null    Null    Null

As you see, VUS is there in the right place. Which leads me to believe that there is a discrepancy between what you posted and what you used for input.
Also, there is a discrepancy between what my code would output for first field of second line: a 1; meaning line 1 and what you are showing: a 2.

If you would like to continue troubleshooting it, all that I can offer you is the result of what your input looks like when reformatted to show tabs.

Code:
perl -pe 's/\t/\[TAB\]/g' new_cmccabe_input

Code:
Chr[TAB]Start[TAB]End[TAB]Ref[TAB]Alt[TAB]Func.refGene[TAB]Gene.refGene[TAB]GeneDetail.refGene[TAB]ExonicFunc.refGene[TAB]AAChange.refGene[TAB]PopFreqMax[TAB]1000G2012APR_ALL[TAB]1000G2012APR_AFR[TAB]1000G2012APR_AMR[TAB]1000G2012APR_ASN[TAB]1000G2012APR_EUR[TAB]ESP6500si_ALL[TAB]ESP6500si_AA[TAB]ESP6500si_EA[TAB]CG46[TAB]common[TAB]clinvar[TAB]clinvarsubmit[TAB]clinvarreference
4[TAB]41748130[TAB]41748130[TAB]G[TAB]C[TAB]exonic[TAB]PHOX2B[TAB][TAB]synonymous SNV[TAB]PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G[TAB]0.0007[TAB].[TAB].[TAB].[TAB].[TAB].[TAB]0.0005[TAB]0.0002[TAB]0.0007[TAB].

Please, run the same two input lines you used and compare against these. There should be the same, since I am using what you posted.

Note:
I am assuming that you have taken care of making sure this input comes from a properly Unix type file and not a MSDOS.

Last edited by Aia; 02-14-2016 at 12:05 AM.. Reason: Add note
# 19  
Old 02-15-2016
The input is the proper unix style but is slightly different then what I posted: it is only 5 fields. I apologize for the oversight

input
Code:
4	41748130	41748130	G	C

Code:
perl -pe 's/\t/\[TAB\]/g' input

Code:
4[TAB]41748130[TAB]41748130[TAB]G[TAB]C

The additional information is populating by those 5 fields most of the time. A small percentage of the time [9] will be Null and need to be skipped, thats what $_ or next; this was supposed to do in the original code. [45] is stil "VUS" however. Thank you Smilie.
# 20  
Old 02-15-2016
Quote:
Originally Posted by cmccabe
The input is the proper unix style but is slightly different then what I posted: it is only 5 fields. I apologize for the oversight

input
Code:
4	41748130	41748130	G	C

Code:
perl -pe 's/\t/\[TAB\]/g' input

Code:
4[TAB]41748130[TAB]41748130[TAB]G[TAB]C

The additional information is populating by those 5 fields most of the time. A small percentage of the time [9] will be Null and need to be skipped, thats what $_ or next; this was supposed to do in the original code. [45] is stil "VUS" however. Thank you Smilie.
It would have complained about: Use of uninitialized value in split if encounters such a short input.
Here's the previous code with the modification to accommodate that small percentage of times that the input does not have a "PHOX2B:NM_003924.3:exon3:c.C639GSmilie.G213G" string

Code:
#!/usr/bin/env perl
# reformat.pl
use strict;
use warnings;

my %nms = (
    "NM_004004.5" => "AR",
    "NM_004992.3" => "XLD",
    "NM_003924.3" => "AD"
);

my $readf = shift || die "Missing input file: $!\n";
my $writef = shift || die "Missing output file: $!\n";

my @header = (
    "Index",
    "Chromosome Position",
    "Gene",
    "Inheritance",
    "RNA Accession",
    "Chr",
    "Coverage",
    "Score",
    "A(#F,#R)",
    "C(#F,#R)",
    "G(#F,#R)",
    "T(#F,#R)",
    "Ins(#F,#R)",
    "Del(#F,#R)",
    "SNP db_xref",
    "Mutation Call",
    "Mutant Allele Frequency",
    "Amino Acid Change",
    "HP",
    "SPLICE",
    "Pseudogene",
    "Classification",
    "HGMD",
    "Disease",
    "Sanger",
    "References",
);

open my $in, '<', $readf or die "Cannot open $readf: $!\n";
open my $out, '>', $writef or die "Cannot create $writef: $!\n";

my $add2header;
chomp( $add2header = <$in> );
splice @header, 18, 0, $add2header;
save(@header);

$.=0;
while( <$in> ) {
    chomp;
    my @ruler = (("Null")x17, ("")x25, ("Null")x8);
    my @fields = split /\t/;
    if($fields[9]) {
        my $len = @fields;
        splice @ruler, 17, $len, @fields;
        my ($gene, $transcript, $exon, $coding, $aa) = split /:/, $fields[9];
        $ruler[0] = $.;
        $ruler[2] = $gene;
        $ruler[3] = $nms{$transcript};
        $ruler[4] = $transcript;
        $ruler[15] = $coding;
        $ruler[17] = $aa;
        $ruler[45] = "VUS";
        save(@ruler);
    }
}

sub save {
    local $" = "\t";
    print $out "@_\n";
}

close $in;
close $out;

Nevertheless, that would not do anything to solve your input discrepancy.
Did you compare the input that produced the defective reformat output with the one you posted previously?

Last edited by Aia; 02-15-2016 at 08:17 PM..
# 21  
Old 02-16-2016
I got pulled away before I could, but will try it first thing tomorrow. Thank you Smilie.

---------- Post updated 02-16-16 at 09:11 AM ---------- Previous update was 02-15-16 at 06:22 PM ----------

input
Code:
 Chr Start End Ref Alt Func.refGene Gene.refGene GeneDetail.refGene ExonicFunc.refGene AAChange.refGene PopFreqMax 1000G2012APR_ALL 1000G2012APR_AFR 1000G2012APR_AMR 1000G2012APR_ASN 1000G2012APR_EUR ESP6500si_ALL ESP6500si_AA ESP6500si_EA CG46 common clinvar clinvarsubmit clinvarreference
4 41748130 41748130 G C exonic PHOX2B  synonymous SNV PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G 0.0007 . . . . . 0.0005 0.0002 0.0007 .

perl
Code:
perl -pe 's/\t/\[TAB\]/g' input

output
Code:
Chr[TAB]Start[TAB]End[TAB]Ref[TAB]Alt[TAB]Func.refGene[TAB]Gene.refGene[TAB]GeneDetail.refGene[TAB]ExonicFunc.refGene[TAB]AAChange.refGene[TAB]PopFreqMax[TAB]1000G2012APR_ALL[TAB]1000G2012APR_AFR[TAB]1000G2012APR_AMR[TAB]1000G2012APR_ASN[TAB]1000G2012APR_EUR[TAB]ESP6500si_ALL[TAB]ESP6500si_AA[TAB]ESP6500si_EA[TAB]CG46[TAB]common[TAB]clinvar[TAB]clinvarsubmit[TAB]clinvarreference
4[TAB]41748130[TAB]41748130[TAB]G[TAB]C[TAB]exonic[TAB]PHOX2B[TAB][TAB]synonymous SNV[TAB]PHOX2B:NM_003924.3:exon3:c.C639G:p.G213G[TAB]0.0007[TAB].[TAB].[TAB].[TAB].[TAB].[TAB]0.0005[TAB]0.0002[TAB]0.0007[TAB].


Last edited by cmccabe; 02-16-2016 at 11:18 AM.. Reason: updated output
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to add line breaks to perl command with large text in single quotes?

Below code extracts multiple field values from XML into array and prints all in one line. perl -nle '@r=/(?: jndiName| authDataAlias| value| minConnections| maxConnections| connectionTimeout| name)="(+)/g and print join ",",$ENV{tIPnSCOPE},$ENV{pr ovider},$ENV{impClassName},@r' server.xml ... (4 Replies)
Discussion started by: kchinnam
4 Replies

2. Shell Programming and Scripting

awk to skip lines find text and add text based on number

I am trying to use awk skip each line with a ## or # and check each line after for STB= and if that value in greater than or = to 0.8, then at the end of line the text "STRAND BIAS" is written in else "GOOD". So in the file of 4 entries attached. awk tried: awk NR > "##"' "#" -F"STB="... (6 Replies)
Discussion started by: cmccabe
6 Replies

3. Programming

Perl find text and add line

Hi All I need to add a line to a file but after a certain block of text is found The block of text looks like this <RDF:Description RDF:about="urn:mimetype:video/quicktime" NC:value="video/quicktime" and i need to add this in the next line down ( note there is... (4 Replies)
Discussion started by: ab52
4 Replies

4. Programming

Even the Static cURL Library Isn't Static

I'm writing a program which uses curl to be run on Linux PCs which will be used by a number of different users. I cannot make the users all install curl on their individual machines, so I have tried to link curl in statically, rather than using libcurl.so. I downloaded the source and created a... (8 Replies)
Discussion started by: BrandonShw
8 Replies

5. UNIX for Advanced & Expert Users

Static code analysis for Perl

As an addition to our ongoing investigation into static code analysis tools for a Perl programming we are maintaining, can anyone recommend a certain tool that he/she is experienced with? We are already actively using perl::critic (Perl::Critic) and rats... (2 Replies)
Discussion started by: figaro
2 Replies

6. Shell Programming and Scripting

Removing text between two static strings

Hi everyone, I need to replace the text between two strings (html tags) and I'm having trouble figuring out how to do so. I can display the text with sed but I'm not having any luck deleting the text between the two strings. My file looks like this: <oths>test</oths><div class="text">1928... (2 Replies)
Discussion started by: cg2
2 Replies

7. IP Networking

I need HELP to Set up Coyote Linux router with 1 static IP & 64 internal static IP

hello, i need help on setting my coyote linux, i've working on this for last 5 days, can't get it to work. I've been posting this message to coyote forum, and other linux forum, but haven't get any answer yet. Hope someone here can help me...... please see my attached picture first. ... (0 Replies)
Discussion started by: dlwoaud
0 Replies

8. Shell Programming and Scripting

How to add static lines to short file?

I've got a simple log file that looks something like this: And I need to append it to look like this: So I just want to add a timestamp and a static (non-variable) word to each line in the file. Is there an easy scripted way to cat the file and append that data to each line....?? (4 Replies)
Discussion started by: kevinmccallum
4 Replies

9. Red Hat

permanently add static route

I have a machine with an interface that has two different addresses on CentOS 5 eth0: 10.20.21.77 eth0:1 141.218.1.221 If I issue this command I get the result I'm looking for. /sbin/route add -net 141.218.1.0 netmask 255.255.255.0 gw 10.20.21.77 ip route show dev eth0 141.218.1.0/24... (1 Reply)
Discussion started by: beaker457
1 Replies

10. Solaris

Add Static Routes to new physical address

Hi, I need help to add new route: 10.252.0.138, GW 10.252.0.129 to e1000g1 and 10.252.0.10, GW 10.252.0.1 to e1000g2 tnx (4 Replies)
Discussion started by: mehrdad68
4 Replies
Login or Register to Ask a Question