Help with String manipulation


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help with String manipulation
# 8  
Old 04-11-2011
Try:
Code:
#!/usr/bin/perl
open I, "$ARGV[0]";
local $/;
$_=<I>;
s/^(>.*)\n//;
print "$1\n";
s/\n//g;
$a=1;
while (/N+|[ACGT]+/g){
  $s=$&;
  $l=length $s;
  @F=split //, $s;
  for $i (@F){
    $h{$i}++
  }
  print "$a-" . ($a+$l-1) . "\t$l\t";
  for $i (keys %h){
    print "$i=$h{$i} ";
  }
  print "\n";
  %h=();
  $a+=$l;
}

The $& variable contains string matched by regex present in while loop condition brackets.
This User Gave Thanks to bartus11 For This Post:
# 9  
Old 04-11-2011
Thanks Bartus ... I didnt know I could capture regex matches while performing sed .... That makes it simpler and concise too !!
Thanks and Hv a nice day
CheersSmilie
# 10  
Old 04-11-2011
Just one small thing.. s/// is not "sed" Smilie it is "substitute operator", that is present in many tools (sed, vi, Perl).
This User Gave Thanks to bartus11 For This Post:
# 11  
Old 04-11-2011
Thanks for clarifying Smilie
# 12  
Old 04-11-2011
Hi

You do not have to use any language same time for alternative solutions.
lets try shell version Smilie

Code:
# cat text
>S8_SK1.chr01
DDDDDDDDDTTTTTTTTTTTTTTTTTTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNCAGCATGCAATAAGGTGACATAGATATACCCACACACCACACCCTAACACTAACCCTAATCTAACCCTGGCCAACCTGTTTVVVVVVVVVVVVV
TGTATACTGATTTTACGTACNNNNNNNNNNNNNNRRRRRRRRRRRRRRRRRRRRRRRRRTTTTTTTTTTTTTTTTTTTTTTTTMMMMMIIIIIIIIIIIIAAA
>S8_SK1.chr02
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVDDDDDDDDDDDDDDDDDDDDDDDDDDDDDASSSAAAAAACCCCCCCCCCCCCCCAAAAAA
YYYYYYYYYYYYYYYYYYYYYYYZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADDDDD
ASAOIJRJFFFFFFTTTTTPPPPPAAAAKKKKKKKKKKKKKKKKKKKKKKKKKKKKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNAAAAA
>S8_SK1.chr03
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFFFFFFFFFFFFF
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
AAASADEAAAABACNNNNNNNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXZZZZZZZZZZZZZZZZAWQERERRRR

Code:
# ./justdoit text
>S8_SK1.chr01
     1-100       100    D=9 N=71 T=20
   101-214       113    A=28 C=28 G=9 N=19 T=16 V=13
   215-318       103    A=8 C=3 G=3 I=12 M=5 N=14 R=25 T=33
>S8_SK1.chr02
     1-106       106    A=13 C=15 D=29 S=3 V=46
   107-212       105    A=37 D=5 Y=23 Z=40
   213-318       105    A=11 F=6 I=1 J=2 K=28 M=43 N=1 O=1 P=5 R=1 S=1 T=5
>S8_SK1.chr03
      1-88        88    D=36 F=15 W=37
    89-184        95    H=35 J=60
   185-265        80    A=10 B=1 C=1 D=1 E=3 N=21 Q=1 R=5 S=1 W=1 X=19 Z=16

Code:
#!/bin/bash
##justdoit

while read -r line
do
 if [ $(echo $line|grep "^>") ] ; then
    firstc=1 ; l=0 ; xgr=()
    firstline="$line" ;  echo "$firstline"
 else
    xgr=()
    ((l++))
    for i in {A..Z}
     do
      count=${#line}
      if [ $(echo $line | grep $i) ]; then
       st=$(echo $line | grep -o $i |wc -l)
       xgr=(${xgr[@]} $i=$st )
      fi
     done
      if [ $l -eq 1 ] ; then
       firstc=1 ; lastc=$count
      else
       (( firstc = $lastc + 1 ))
       (( lastc = $firstc + $count ))
      fi
    gr="${xgr[@]}"
    printf "%10s %3c %5d %3c %s\n" "$firstc-$lastc" " " "$count" "" "$gr"
 fi
done<$1

regards
ygemici
# 13  
Old 04-12-2011
@ygemici:
Thanks for your alternative solution which is somewhat not what I want but thanx anyway. Your version only differentiates each line as a contig and not a continuation of specific characters as is possible with the Perl version.
Have a nice day Smilie
# 14  
Old 04-13-2011
MySQL

Quote:
Originally Posted by pawannoel
@ygemici:
Thanks for your alternative solution which is somewhat not what I want but thanx anyway. Your version only differentiates each line as a contig and not a continuation of specific characters as is possible with the Perl version.
Have a nice day Smilie
it's not problem..if you really want to other characters you can add Smilie
i try to make some list Smilie

Code:
# cat charlist
0 P ! 1 A Q a q '"' 2 B R b r \# 3 C S c s \$ 4 D T d t % 5 E U e u \& 6 F V f v 'â' G W g w '(' 8 H X h x ')' 9 I Y i y '*' J Z j + \; K \[ k '{' , '<' L '\' l '|' - = M ] m '}' \. '>' N '\^' n \~ '/' '?' O _ o D E L 2 F P Z d n x 3 = G e o y 4 H R f p z 5 I S g q 6 @ J T h r 7 A K U _ i s 8 B L V t 9 C M W a k u D E L 0 \: D N X b l v E O Y c m w

Code:
# cat text
>S8_SK1.chr01
QQQQQQ@@@@@@@@zzzzzzz++z][.ZZWW!!!@!@!!#??(|||$$$%%&$&%$#&#$@#&$(&(&((P"{"}{|Z|||\\>>>>><<<<<<<||!@!@!@#!~@~!!@$#.....::::'';;
DDDDDDDDDTTTTTTTTTTTTTTTTTTTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNCAGCATGCAATAAGGTGACATAGATATACCCACACACCACACCCTAACACTAACCCTAATCTAACCCTGGCCAACCTGTTTVVVVVVVVVVVVV
TGTATACTGATTTTACGTACNNNNNNNNNNNNNNRRRRRRRRRRRRRRRRRRRRRRRRRTTTTTTTTTTTTTTTTTTTTTTTTMMMMMIIIIIIIIIIIIAAA
>S8_SK1.chr02
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVDDDDDDDDDDDDDDDDDDDDDDDDDDDDDASSSAAAAAACCCCCCCCCCCCCCCAAAAAA
YYYYYYYYYYYYYYYYYYYYYYYZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADDDDD
ASAOIJRJFFFFFFTTTTTPPPPPAAAAKKKKKKKKKKKKKKKKKKKKKKKKKKKKMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMNAAAAA
>S8_SK1.chr03
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDFFFFFFFFFFFFFFF
JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
AAASADEAAAABACNNNNNNNNNNNNNNNNNNNNNXXXXXXXXXXXXXXXXXXXZZZZZZZZZZZZZZZZAWQERERRRR

Code:
# ./justdoit text
>S8_SK1.chr01
     1-126       126    'P'=1 '!'=12 'Q'=6 '#'=6 '$'=8 '%'=3 '&'=6 'W'=2 '*'=1 'Z'=3 '+'=2 ';'=2 '['=1 ']'=1 '.'=6 'P'=1 'Z'=3 'z'=8 '@'=16 'W'=2 ':'=4
   127-227       100    'D'=9 'T'=20 'N'=71 'D'=9 'T'=20 'D'=9 'D'=9 'N'=71
   228-341       113    'A'=28 'C'=28 'T'=16 'V'=13 'G'=9 'N'=19 'G'=9 'T'=16 'A'=28 'V'=13 'C'=28 'N'=19
   342-445       103    'A'=8 'R'=25 'C'=3 'T'=33 'G'=3 'I'=12 'M'=5 'N'=14 'G'=3 'R'=25 'I'=12 'T'=33 'A'=8 'C'=3 'M'=5 'N'=14
>S8_SK1.chr02
     1-106       106    'A'=13 'C'=15 'S'=3 'D'=29 'V'=46 'D'=29 'S'=3 'A'=13 'V'=46 'C'=15 'D'=29 'D'=29
   107-212       105    'A'=37 'D'=5 'Y'=23 'Z'=40 'D'=5 'Z'=40 'A'=37 'D'=5 'D'=5 'Y'=23
   213-318       105    'P'=5 'A'=11 'R'=1 'S'=1 'T'=5 'F'=6 'I'=1 'J'=2 'K'=28 'M'=43 'N'=1 'O'=1 'F'=6 'P'=5 'R'=1 'I'=1 'S'=1 'J'=2 'T'=5 'A'=11 'K'=28 'M'=43 'N'=1 'O'=1
>S8_SK1.chr03
      1-88        88    'D'=36 'F'=15 'W'=37 'D'=36 'F'=15 'W'=37 'D'=36 'D'=36
    89-184        95    'H'=35 'J'=60 'H'=35 'J'=60
   185-265        80    'A'=10 'Q'=1 'B'=1 'R'=5 'C'=1 'S'=1 'D'=1 'E'=3 'W'=1 'X'=19 'Z'=16 'N'=21 'D'=1 'E'=3 'Z'=16 'R'=5 'S'=1 'A'=10 'B'=1 'C'=1 'W'=1 'D'=1 'E'=3 'D'=1 'N'=21 'X'=19 'E'=3

Code:
# cat justdoit
#!/bin/bash
##justdoit
while read -r line
do
 if [ $(echo "$line"|grep "^>") ] ; then
    firstc=1 ; l=0 ; xgr=()
    firstline="$line" ;  echo "$firstline"
 else
    xgr=()
    ((l++))
    for i in `cat charlist`
     do
      count=${#line}
      if [ $(echo "$line" | grep $i) ]; then
       charcount=$(echo "$line" | grep -o "$i" |wc -l)
       val=$(eval echo "$i")
       xgr=(${xgr[@]} "'$val'=$charcount" )
      fi
     done
      if [ $l -eq 1 ] ; then
       firstc=1 ; lastc=$count
      else
       (( firstc = $lastc + 1 ))
       (( lastc = $firstc + $count ))
      fi
    gr="${xgr[@]}"
    printf "%10s %3c %5d %3c %s\n" "$firstc-$lastc" " " "$count" "" "$gr"
 fi
done<$1

regards
ygemici

Last edited by ygemici; 04-14-2011 at 05:26 AM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

String Manipulation

I'm making a little game in Perl, and I am trying to remove the first instance of a character in an arbitrary string. For example, if the string is "cupcakes"and the user enters another string that contains letters from "cupcake" e.g: "sake"the original string will now look like this (below)... (3 Replies)
Discussion started by: whyte_rhyno
3 Replies

2. Shell Programming and Scripting

Deleting part of a string : string manipulation

i have something like this... echo "teCertificateId" | awk -F'Id' '{ print $1 }' | awk -F'te' '{ print $2 }' Certifica the awk should remove 'te' only if it is present at the start of the string.. anywhere else it should ignore it. expected output is Certificate (7 Replies)
Discussion started by: vivek d r
7 Replies

3. Shell Programming and Scripting

String manipulation

Hello Could you help with small script: How to split string X1 into 3 string String X1 can have 1 or many strings X1='A1:B1:C1:D1:A2:B2:C2:D2:A3:B3:C3:D3' This is output which I want to have: Z1='A1:B1:C1:D1' Z2='A2:B2:C2:D2' Z3='A3:B3:C3:D3' (5 Replies)
Discussion started by: vikus
5 Replies

4. Shell Programming and Scripting

String manipulation

Hi All, Pls help me out on the below, 05 LAMSZ201-ZM-MEMO2-DATE02-5 PIC X(10). 05 LAMSZ201-ZM-MEMO2-AMT02-5 PIC S9(13)V99. 05 LAMSZ201-ZM-MEMO2-TYPE02-6 PIC XXX. 05 LAMSZ201-ZM-MEMO2-DATE02-6 PIC X(10). 05 ... (2 Replies)
Discussion started by: baskivs
2 Replies

5. Shell Programming and Scripting

String Manipulation

Hi, I have a file in the following format 123|shanwer|15DEC2010|bgbh|okok|16JAN3000|okok| I want the following to be in following format 123|shanwer|12\15\2010|bgbh|okok|01\16\3000|okok| SED/PERL/AWK Gurus could you please help me with this? Thanks Shankar (8 Replies)
Discussion started by: Shan2210
8 Replies

6. Shell Programming and Scripting

I need help with string manipulation

First of all I am VERY new to this so bare with me and try and explain everything even if it seems simple. Basically I want to read a line of text from a html file. See if the line of text has a certain string in it. copy an unknown number of characters (the last 4 characters wiil be ".jpg" the... (1 Reply)
Discussion started by: c3lica
1 Replies

7. Shell Programming and Scripting

string manipulation

i have a file that contains a pattern like this: ajay 1234 newyork available kumar 2345 denver singh 2345 newyork ajay 3456 denver kumar 3456 newyork singh 3456 delhi available ajay 4567 miami kumar 4567 miami singh 4567 delhi i want to search for each line... (5 Replies)
Discussion started by: ajay41aj
5 Replies

8. Shell Programming and Scripting

String manipulation

Hi, i am just gettin exposed to UNIX. Could anyone of u help me out with dis problem..? i have a variable 'act' which has the value as follows, echo $act gives -0- -0- -----0---- 2008-06-04 -0- -0- echo "$act" | awk '{print ($act)}' gives, -0- -0- -----0---- 2008-06-04 -0- -0- I... (2 Replies)
Discussion started by: jerrynimrod
2 Replies

9. UNIX for Dummies Questions & Answers

string manipulation

Hi, I have a file with rows of text like so : E100005568374098100000015667 D100005568374032000000112682 H100005228374060800000002430 I need to grab just the last digits(bolded) of each line without the proceeding text/numbers. Thanks (5 Replies)
Discussion started by: james6
5 Replies

10. Shell Programming and Scripting

String Manipulation

Hi, Suppose I have the following text in a file. ORA-00942: table or view does not exist ORA-01555: snapshot too old: rollback segment number string with name "string" too small Is there any way I can list all the text that starts only with 'ORA-'? Or there any grep command that can... (7 Replies)
Discussion started by: kakashi_jet
7 Replies
Login or Register to Ask a Question