Help with String manipulation


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help with String manipulation
# 1  
Old 04-11-2011
Lightbulb Help with String manipulation

Dear All,
I have a question.
I have files with the following pattern.
Code:
>S8_SK1.chr01
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNCAGCATGCAATAAGGTGACATAGATATACCCACACACCACACCCTAACACTAACCCTAATCTAACCCTGGCCAACCTGTTT
TGTATACTGATTTTACGTACNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Now what I'd like to so is to count number of contigs in the string starting from the second line of the file, exculding always the first line beginning with ">" followed by anything. So my expected output for the above would be to get the range, size and composition of each contig/substring if I'm not wrong within the string.
Code:
>S8_SK1.chr01
1-119       119   N=119
120-220     101   A=33 C=31 G=12 T=25
221-300     80    N=80

Can someone enlighten on how I can accomplish this task ?
I'll really appreciate your input.
Cheers and hv a nice day Smilie
# 2  
Old 04-11-2011
Try:
Code:
perl -F// -lane 'BEGIN{$a=1}if ($. > 1){$l=length;for $i (@F){$a{$i}++};printf "$a-" . ($l+$a-1) . "\t$l\t"; for $i (keys %h){printf "$i=$h{$i} "};printf "\n";%h=();$a+=$l}' file

# 3  
Old 04-11-2011
This is giving me the following output with the above sample Smilie
Code:
1-100    100    
101-200    100    
201-300    100

# 4  
Old 04-11-2011
Sorry, one mistype:
Code:
perl -F// -lane 'BEGIN{$a=1}if ($. > 1){$l=length;for $i (@F){$h{$i}++};printf "$a-" . ($l+$a-1) . "\t$l\t"; for $i (keys %h){printf "$i=$h{$i} "};printf "\n";%h=();$a+=$l}' file

# 5  
Old 04-11-2011
This is giving me
Code:
1-100    100    N=100 
101-200    100    A=28 T=16 C=28 N=19 G=9 
201-300    100    A=5 T=9 N=80 C=3 G=3

But I think its still not differentialing between substrings. My definition for substring is not one line as in the above output but a continuation of either N's or a continuation of ACGT's. Thus, in the output the $F[1] is not expected to have 100 but the length of the range in $F[0] depending on the length of the contig.
Sorry for not being clear earlier.
Cheers and thanks always for your input,
Good day Smilie
# 6  
Old 04-11-2011
Try this script:
Code:
#!/usr/bin/perl
open I, "$ARGV[0]";
local $/;
$_=<I>;
s/^>.*\n//;
s/\n//g;
$a=1;
while (/N+|[ACGT]+/g){
  $s=$&;
  $l=length $s;
  @F=split //, $s;
  for $i (@F){
    $h{$i}++
  }
  print "$a-" . ($a+$l-1) . "\t$l\t";
  for $i (keys %h){
    print "$i=$h{$i} ";
  }
  print "\n";
  %h=();
  $a+=$l;
}

Run it as always: ./script.pl file
# 7  
Old 04-11-2011
Yeah that did the jobas desired Smilie ... Thank you very much. The only thing missing is that it doesnt print the file name before the output. Can stored the name in a variable before performing the sed on the first line, which is the same as the file name? I tried storing it in $1 and then later printing $1 like this but pls correct me coz it didnt work
Code:
$1=~/(^>.*)/;
s/^>.*\n//;

Could you please also explain the use of $&.

Cheers Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

String Manipulation

I'm making a little game in Perl, and I am trying to remove the first instance of a character in an arbitrary string. For example, if the string is "cupcakes"and the user enters another string that contains letters from "cupcake" e.g: "sake"the original string will now look like this (below)... (3 Replies)
Discussion started by: whyte_rhyno
3 Replies

2. Shell Programming and Scripting

Deleting part of a string : string manipulation

i have something like this... echo "teCertificateId" | awk -F'Id' '{ print $1 }' | awk -F'te' '{ print $2 }' Certifica the awk should remove 'te' only if it is present at the start of the string.. anywhere else it should ignore it. expected output is Certificate (7 Replies)
Discussion started by: vivek d r
7 Replies

3. Shell Programming and Scripting

String manipulation

Hello Could you help with small script: How to split string X1 into 3 string String X1 can have 1 or many strings X1='A1:B1:C1:D1:A2:B2:C2:D2:A3:B3:C3:D3' This is output which I want to have: Z1='A1:B1:C1:D1' Z2='A2:B2:C2:D2' Z3='A3:B3:C3:D3' (5 Replies)
Discussion started by: vikus
5 Replies

4. Shell Programming and Scripting

String manipulation

Hi All, Pls help me out on the below, 05 LAMSZ201-ZM-MEMO2-DATE02-5 PIC X(10). 05 LAMSZ201-ZM-MEMO2-AMT02-5 PIC S9(13)V99. 05 LAMSZ201-ZM-MEMO2-TYPE02-6 PIC XXX. 05 LAMSZ201-ZM-MEMO2-DATE02-6 PIC X(10). 05 ... (2 Replies)
Discussion started by: baskivs
2 Replies

5. Shell Programming and Scripting

String Manipulation

Hi, I have a file in the following format 123|shanwer|15DEC2010|bgbh|okok|16JAN3000|okok| I want the following to be in following format 123|shanwer|12\15\2010|bgbh|okok|01\16\3000|okok| SED/PERL/AWK Gurus could you please help me with this? Thanks Shankar (8 Replies)
Discussion started by: Shan2210
8 Replies

6. Shell Programming and Scripting

I need help with string manipulation

First of all I am VERY new to this so bare with me and try and explain everything even if it seems simple. Basically I want to read a line of text from a html file. See if the line of text has a certain string in it. copy an unknown number of characters (the last 4 characters wiil be ".jpg" the... (1 Reply)
Discussion started by: c3lica
1 Replies

7. Shell Programming and Scripting

string manipulation

i have a file that contains a pattern like this: ajay 1234 newyork available kumar 2345 denver singh 2345 newyork ajay 3456 denver kumar 3456 newyork singh 3456 delhi available ajay 4567 miami kumar 4567 miami singh 4567 delhi i want to search for each line... (5 Replies)
Discussion started by: ajay41aj
5 Replies

8. Shell Programming and Scripting

String manipulation

Hi, i am just gettin exposed to UNIX. Could anyone of u help me out with dis problem..? i have a variable 'act' which has the value as follows, echo $act gives -0- -0- -----0---- 2008-06-04 -0- -0- echo "$act" | awk '{print ($act)}' gives, -0- -0- -----0---- 2008-06-04 -0- -0- I... (2 Replies)
Discussion started by: jerrynimrod
2 Replies

9. UNIX for Dummies Questions & Answers

string manipulation

Hi, I have a file with rows of text like so : E100005568374098100000015667 D100005568374032000000112682 H100005228374060800000002430 I need to grab just the last digits(bolded) of each line without the proceeding text/numbers. Thanks (5 Replies)
Discussion started by: james6
5 Replies

10. Shell Programming and Scripting

String Manipulation

Hi, Suppose I have the following text in a file. ORA-00942: table or view does not exist ORA-01555: snapshot too old: rollback segment number string with name "string" too small Is there any way I can list all the text that starts only with 'ORA-'? Or there any grep command that can... (7 Replies)
Discussion started by: kakashi_jet
7 Replies
Login or Register to Ask a Question