Perl to run different parser based on digit


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl to run different parser based on digit
# 1  
Old 03-07-2017
Perl to run different parser based on digit

The perl parser below works as expected assuming the last digit in the NC_ before the . is a single digit.

Code:
perl -ne 'next if $. == 1;
	if(/.*del([A-Z]+)ins([A-Z]+).*NC_0{4}([0-9]+).*g\.([0-9]+)_([0-9]+)/)   # indel
{
        print join("\t", $3, $4, $5, $1, $2), "\n";
}
           ' out_position.txt > out1.txt


out_position.txt > out2.txt

out_position.txt
Code:
Input Variant	Errors	Chromosomal Variant	Coding Variant(s)
NM_003924.3:c.*18_*19delGCinsAA		NC_000004.11:g.41747805_41747806delinsTT	LRG_513t1:c.*18_*19delinsAA	NM_003924.3:c.*18_*19delinsA

contents of out1.txt --- output is correct

Code:
4	41747805	41747806	GC	AA

However, I can not seem to adjust it to account for the last digit in NC_ before the . in bold, may not always be 1 digit as in the case above, it could be 2 digits, as n the case below. In this case I would need to parse out 4 zeros, instead of 5. So my question is I am not sure how to make the condition in italics in the perl command adjust based on the NC_ being 1 or 2 digits? Thank you Smilie.

Code:
Input Variant	Errors	Chromosomal Variant	Coding Variant(s)
NM_003924.3:c.*18_*19delGCinsAA		NC_000014.11:g.41747805_41747806delinsTT	LRG_513t1:c.*18_*19delinsAA	NM_003924.3:c.*18_*19delinsA

So in this case the desired output would be:

Code:
14     41747805     41747806     GC     AA

It is also possible for the NC_ to be a letter, not a digit, but in that case it is always one letter, NC_00000X.11:g.41747805_41747806delinsTT

Code:
.*NC_0{5}([0-9]+).

to this:

Code:
.*NC_0{5}([0-9]+[A-Z]+).


Last edited by cmccabe; 03-07-2017 at 02:14 PM.. Reason: fixed format
# 2  
Old 03-07-2017
Quote:
Originally Posted by cmccabe
...assuming the last digit in the NC_ before the . is a single digit.
...
However, .... the last digit in NC_ before the . in bold, may not always be 1 digit as in the case above, it could be 2 digits, as n the case below. In this case I would need to parse out 4 zeros, instead of 5.
...
...
It is also possible for the NC_ to be a letter, not a digit, but in that case it is always one letter, ...
So the string is one of the following:

(1) NC_ + five zeros + 1 digit + "." character => you want that one digit before before "." character
(2) NC_ + four zeros + 2 digits + "." character => you want those two digits before "." character
(3) NC_ + five zeros + 1 character + "." character => you want that one character before "." character

One way to look at it is:
NC_ + a sequence of more than one zeros + sequence of characters that are not zero + "." character

And you want to capture that sequence of non-zero characters before the "." character.

Here's a sample regex that does that:

Code:
$ 
$ cat input.txt
NC_000004.11
NC_000014.11
NC_00000X.11
$ 
$ perl -lne 's/NC_0+(.*?)\..*/$1/; print' input.txt
4
14
X
$ 
$

This User Gave Thanks to durden_tyler For This Post:
# 3  
Old 03-07-2017
I thought I understood, but not entirely Smilie, but you are correct those are the 3 conditions that are possible. Thank you very much Smilie.

Code:
perl -ne 'next if $. == 1;
    if(/.*del([A-Z]+)ins([A-Z]+).'s/NC_0+(.*?)\..*/$1/; print')   # indel
{
        print join("\t", $3, $4, $5, $1, $2), "\n";
}
           ' out_position.txt > out.txt
Unknown regexp modifier "/N" at -e line 2, at end of line
Unknown regexp modifier "/C" at -e line 2, at end of line
Unknown regexp modifier "/_" at -e line 2, at end of line
Unknown regexp modifier "/0" at -e line 2, at end of line
syntax error at -e line 2, near "(."
Execution of -e aborted due to compilation errors.


Last edited by cmccabe; 03-07-2017 at 09:14 PM.. Reason: added details
# 4  
Old 03-07-2017
Try this:

Code:
perl -ne 'next if $. == 1;
    if(/.*del([A-Z]+)ins([A-Z]+).*NC_0+([^.]+)\..*g\.([0-9]+)_([0-9]+)/)   # indel
    {
            print join("\t", $3, $4, $5, $1, $2), "\n";
            }
                       ' out_position.txt > out.txt

This User Gave Thanks to Chubler_XL For This Post:
# 5  
Old 03-07-2017
Thank you both very much Smilie.

Chubler_XL

Code:
.*NC_0+([^.]+)\.

Does this look for theNC_ and extract all digits/strings up to the . that are not zero? Thank you Smilie.
# 6  
Old 03-07-2017
Code:
.*NC_0+([^.]+)\.

Look for NC_ followed by 1-or-more zeros
then extract 1-or-more non . characters, when they are followed by a . character.
This User Gave Thanks to Chubler_XL For This Post:
# 7  
Old 03-07-2017
Code:
perl -nle 'BEGIN{$,="\t"}/del([A-Z]{2})ins([A-Z]{2})\s+NC_0+(\w+)\.\d+:\w\.(\d+)_(\d+)/ and print $3,$4,$5,$1,$2' out_position.txt > out2.txt

Code:
cat out2.txt
14      41747805        41747806        GC      AA


Last edited by Aia; 03-07-2017 at 11:38 PM..
This User Gave Thanks to Aia For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Cut first value after underscore and replace first two digit with zero in perl

Like I have below string XX_49154534_491553_201_122023_D XX_49159042_491738_201_103901_D and the expected output would be 0154534 0159042 XX and 49 can be dynamic. (1 Reply)
Discussion started by: nadeemrafikhan
1 Replies

2. Shell Programming and Scripting

Update perl code with parser

The below perl code imports the data in the attached document. However, I can not seem to update the perl code to include a parser like in the desired tab of that document. Thank you :). Most of the data for the parse is included in the document except for the gene and RNA which can is... (0 Replies)
Discussion started by: cmccabe
0 Replies

3. Programming

Parser - multiple in Perl

Dear Perl Experts, Could some body help me to find the solution for my problem below: Input file: ----------- THE-0 tsjp THE-32 tsjp THE-64 tsjp Output desired: --------------- THE-0&&-31 tsjp THE-32&&-63 tsjp THE-64&&-95 tsjp Note: 31 = 0+31, (2 Replies)
Discussion started by: askari
2 Replies

4. Shell Programming and Scripting

Where to find 64-bit based perl module like XML::Parser::Expat?

Q: Where to get a 64 bit Expat.so? I run a perl script and got this error: Can't load '/usr/perl5/vendor_perl/5.8.4/i86pc-solaris-64int/auto/XML/Parser/Expat/Expat.so' for module XML:parser::Expat: ld.so.1:myPerl: fatal:... (0 Replies)
Discussion started by: lilili07
0 Replies

5. Shell Programming and Scripting

Split large file based on last digit from a column

Hello, What's the best way to split a large into multiple files based on the last digit in the first column. input file: f 2738483300000x0y03772748378831x1y13478378358383x2y23743878383802x3y33787828282820x4y43748838383881x5y5 Desired Output: f0 3738483300000x0y03787828282820x4y4 f1... (9 Replies)
Discussion started by: alain.kazan
9 Replies

6. Shell Programming and Scripting

perl config parser

Hello. Can anybody help me with some sub on perl that can parse config like this: %CFG ( 'databases' => { 'db1' => 'db_11', 'db_12', 'db_13', 'db2' => 'db_21', 'db_22', 'db_23' } 'datafiles' => { 'datadir1' => 'datadir_11', 'datadir_12', 'datadir2' =>... (4 Replies)
Discussion started by: drack
4 Replies

7. Shell Programming and Scripting

xml-parser with perl

Hello I want to write an xml- parser with perl an i use the libary XML::LibXML. I have a problem with the command getElementsByTagName. If there is an empty tag, the getElementsByTagName method returns a NodeList of length zero. how can i check if this is a nodelist of lenght zero?? i... (1 Reply)
Discussion started by: trek
1 Replies

8. Shell Programming and Scripting

Perl XML:Parser help

I am very new to XML. Really I have an excel file that I am trying to read w/ Perl on a Linux machine. I don't have a mod for reading excel files so I have to convert the excel file to xml to be able to read it. I can read the file and everything is ok except...the Number style is being dropped... (0 Replies)
Discussion started by: vincaStar
0 Replies

9. Shell Programming and Scripting

xml parser in perl

hi all i want to read xml file in perl i am using XML::Simple for this. i am not getting how to read following file removing xml file due to some reason (1 Reply)
Discussion started by: zedex
1 Replies
Login or Register to Ask a Question