Rewrite sed to perl or run sed in perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Rewrite sed to perl or run sed in perl
# 1  
Old 04-08-2015
Rewrite sed to perl or run sed in perl

I am having trouble re-writing this sed code

Code:
 sed -nr 's/.*del([A-Z]+)ins([A-Z]+).*NC_0{4}([0-9]+).*g\.([0-9]+)_([0-9]+).*/\3\t\4\t\5\t\1\t\2/p' C:/Users/cmccabe/Desktop/Python27/out_position.txt > C:/Users/cmccabe/Desktop/Python27/out_parse.txt

in perl Basically, what the code does is parse text from two fields of a file.

The other option is integrating the sed into the existing perl , but I am not sure how. Thank you Smilie.

Code:
 perl -ne 'next if $. == 1;
            while (/\t*NC_(\d+)\.\S+g\.(\d+)(\S+)/g) {                                            # conditional parse
                ($num1, $num2, $common) = ($1, $2, $3);
                $num3 = $num2;
                if    ($common =~ /^([A-Z])>([A-Z])$/)   { ($ch1, $ch2) = ($1, $2) }              # SNP
                elsif ($common =~ /^del([A-Z])$/)        { ($ch1, $ch2) = ($1, "-") }             # deletion
                elsif ($common =~ /^ins([A-Z])$/)        { ($ch1, $ch2) = ("-", $1) }             # insertion
                elsif ($common =~ /^_(\d+)del([A-Z]+)$/) { ($num3, $ch1, $ch2) = ($1, $2, "-") }  # multi deletion
                elsif ($common =~ /^_(\d+)ins([A-Z]+)$/) { ($num3, $ch1, $ch2) = ($1, "-", $2) }  # multi insertion
                printf ("%d\t%d\t%d\t%s\t%s\n", $num1, $num2, $num3, $ch1, $ch2);                 # output
                map {undef} ($num1, $num2, $num3, $common, $ch1, $ch2);
            }
           ' C:/Users/cmccabe/Desktop/Python27/out_position.txt > C:/Users/cmccabe/Desktop/Python27/out_parse.txt

# 2  
Old 04-08-2015
If you don't show us the contents of that file we'll never be able to guess what you want.
# 3  
Old 04-09-2015
Code:
 
Parse Rules:

The header is skipped and

4 zeros after the NC_ (not always the case) and the digits before the .
g. ### (before underscore) _### (# after the _)
letters after the "del" until the “ins”
letters after the "ins"

Desired output:   13     20763121     20763129     GTGTCTGGA     CAGTGTTCATGACATTC

The sed line works by itself. but not as part of the perl script and I do not know enough to either call sed from perl or how to rewrite the commaand for perl . The problem is in parse rules steps 1 and 2 come from $2 (NC_000013.10:g.20763121_20763129delinsGAATGTCATGAACACTG) and steps 3 and 4 come from $1 (NM_004004.5:c.592_600delGTGTCTGGAinsCAGTGTTCATGACATTC).Thank you Smilie.

Perl command:
Code:
 
perl -ne 'next if $. == 1;
            while (/\t*NC_(\d+)\.\S+g\.(\d+)(\S+)/g) {                                            # conditional parse
                ($num1, $num2, $common) = ($1, $2, $3);
                $num3 = $num2;
                if    ($common =~ /^([A-Z])>([A-Z])$/)   { ($ch1, $ch2) = ($1, $2) }              # SNP
                elsif ($common =~ /^del([A-Z])$/)        { ($ch1, $ch2) = ($1, "-") }             # deletion
                elsif ($common =~ /^ins([A-Z])$/)        { ($ch1, $ch2) = ("-", $1) }             # insertion
                elsif ($common =~ /^_(\d+)del([A-Z]+)$/) { ($num3, $ch1, $ch2) = ($1, $2, "-") }  # multi deletion
				elsif ($common =~ /^_(\d+)ins([A-Z]+)$/) { ($num3, $ch1, $ch2) = ("-", $1, $2) }  # multi insertion
				elsif sed -nr 's/.*del([A-Z]+)ins([A-Z]+).*NC_0{4}([0-9]+).*g\.([0-9]+)_([0-9]+).*/\3\t\4\t\5\t\1\t\2/p'  # indel
                printf ("%d\t%d\t%d\t%s\t%s\n", $num1, $num2, $num3, $ch1, $ch2);                 # output
                map {undef} ($num1, $num2, $num3, $common, $ch1, $ch2);
            }
           ' C:/Users/cmccabe/Desktop/Python27/out_position.txt > C:/Users/cmccabe/Desktop/Python27/out_parse.txt


Last edited by cmccabe; 04-09-2015 at 11:46 AM.. Reason: added some more detail
# 4  
Old 04-09-2015
The sed part shouldn't be inside the while loop, should it? It runs once and exactly once to extract from an entire line.

Code:
next if $. == 1;

if(/.*del([A-Z]+)ins([A-Z]+).*NC_0{4}([0-9]+).*g\.([0-9]+)_([0-9]+)/)
{
        print join("\t", $3, $4, $5, $1, $2), "\n";
}

# rest of your code
while(...)

...

# 5  
Old 04-09-2015
Almost perfect... the last 3 fields in the attached output 13 20763121 20763121 , I believe come from the while clause, is there a way to remove those fields? Thank you for your help Smilie.

Code:
parse() {
    printf "\n\n"
	cd 'C:\Users\cmccabe\Desktop\annovar'
    perl -ne 'next if $. == 1;
	if(/.*del([A-Z]+)ins([A-Z]+).*NC_0{4}([0-9]+).*g\.([0-9]+)_([0-9]+)/)   # indel
{
        print join("\t", $3, $4, $5, $1, $2), "\n";
}
            while (/\t*NC_(\d+)\.\S+g\.(\d+)(\S+)/g) {                                            # conditional parse
                ($num1, $num2, $common) = ($1, $2, $3);
                $num3 = $num2;
                if    ($common =~ /^([A-Z])>([A-Z])$/)   { ($ch1, $ch2) = ($1, $2) }              # SNP
                elsif ($common =~ /^del([A-Z])$/)        { ($ch1, $ch2) = ($1, "-") }             # deletion
                elsif ($common =~ /^ins([A-Z])$/)        { ($ch1, $ch2) = ("-", $1) }             # insertion
                elsif ($common =~ /^_(\d+)del([A-Z]+)$/) { ($num3, $ch1, $ch2) = ($1, $2, "-") }  # multi deletion
				elsif ($common =~ /^_(\d+)ins([A-Z]+)$/) { ($num3, $ch1, $ch2) = ("-", $1, $2) }  # multi insertion
                printf ("%d\t%d\t%d\t%s\t%s\n", $num1, $num2, $num3, $ch1, $ch2);                 # output
                map {undef} ($num1, $num2, $num3, $common, $ch1, $ch2);
            }
           ' C:/Users/cmccabe/Desktop/annovar/out_position.txt > C:/Users/cmccabe/Desktop/annovar/out_parse.txt
		   annovar
}

# 6  
Old 04-09-2015
If the new section prints everything you need already, just remove the entire 'while'.
# 7  
Old 04-09-2015
If I remove the while , I get the attached output. Thank you Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl or sed command ?

Hi Guys Am working on a bash script but got stuck, in this line: 32 $configValues = ''; What would be the best command to enter the password between the " Perl or sed ? Been trying with Perl using this command: perl -pi -e 's/''/Seattle#1669!/g'... (5 Replies)
Discussion started by: Tox
5 Replies

2. Shell Programming and Scripting

sed and egrep in perl

Hi i have a data file whcih contains the data as follows : FH332OY86|AAABBB CCCC DDDA FHLMC 30 8.000|FHLMC|3|30|8.00000000|1986|26.29164289|3.29544844|0.00000000|10.05940539|107.50704264|Mar 8 2013 12:00AM|20130311|D|DA|DAA|DAAC|201302 FH332OY87|AAABBB CCCC DDDA FHLMC 30... (9 Replies)
Discussion started by: ptappeta
9 Replies

3. Shell Programming and Scripting

Rsync script to rewrite suffix - BASH, awk, sed, perl?

trying to write up a script to put the suffix back. heres what I have but can't get it to do anything :( would like it to be name.date.suffix rsync -zrlpoDtub --suffix=".`date +%Y%m%d%k%M%S`.~" --bwlimit=1024 /mymounts/test1/ /mymounts/test2/ while IFS=. read -r -u 9 -d '' name... (1 Reply)
Discussion started by: jmituzas
1 Replies

4. Shell Programming and Scripting

awk sed perl??

Hi guys First of all a salute to this wonderful platform which has helped me many a times. Im now faced with something that I cannot solve. I have data like this 11:14:18 0.46975 11:14:18 0.07558 11:14:18 0.00020 11:14:18 0.00120 11:14:18 0.25879 11:14:19 0.00974 11:14:19 0.05656... (13 Replies)
Discussion started by: jamie_123
13 Replies

5. Shell Programming and Scripting

re-Substitution Sed (or Perl)

I have a large text csv file that I'm working with. It will look something like this: D,",E",C O,"F,",I O,gh,R The second column always has a two digit random code (can be numbers, letters or any characters). When one of the characters happens to be a comma, the string is quoted. I want to... (5 Replies)
Discussion started by: beenny
5 Replies

6. Shell Programming and Scripting

Need Help with sed/perl !

In a file the content is 13 box google unix.com "he is google" hello "he is unix.com" - I need to replace each space char with "a" char but not inside the double quoted strings. So, the output must look like, 13aboxagoogleaunix.coma"he is google"ahelloa"he is unix.com"a- I tried with ... (9 Replies)
Discussion started by: gameboy87
9 Replies

7. Shell Programming and Scripting

Sed/Perl help

Some text is like this.... <table>This is first text.</table>mouse <table>This is second text</table>keyboard <table>This is third text</table>Pad I need to insert <a></a> between "mouse","keyboard","Pad". I it possible to do with sed/Perl ? Please help.. The text should look like... (8 Replies)
Discussion started by: gameboy87
8 Replies

8. Shell Programming and Scripting

Problem with sed in perl!!

Hi, I am facing an issue with sed in perl. I have a file which has 2 header lines and one trailer line. I need to process the file without these headers and trailer. My file looks like : File.txt:- Header1 Header2 data1 data2 trailer For removing header and trailer from file I am using... (5 Replies)
Discussion started by: abhisharma23
5 Replies

9. Shell Programming and Scripting

[Perl] Accessing array elements within a sed command in Perl script

I am trying to use a script to replace the header of each file, whose filename are stored within the array $test, using the sed command within a Perl script as follows: $count = 0; while ( $count < $#test ) { `sed -e 's/BIOGRF 321/BIOGRF 332/g' ${test} > 0`; `cat 0 >... (2 Replies)
Discussion started by: userix
2 Replies

10. Shell Programming and Scripting

Perl: Run perl script in the current process

I have a question regarding running perl in the current process. I shall demonstrate with an example. Look at this. sh-2.05b$ pwd /tmp sh-2.05b$ cat test.sh #! /bin/sh cd /etc sh-2.05b$ ./test.sh sh-2.05b$ pwd /tmp sh-2.05b$ . ./test.sh sh-2.05b$ pwd /etc sh-2.05b$ So... (10 Replies)
Discussion started by: vino
10 Replies
Login or Register to Ask a Question