Help with finding matching position on strings


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Help with finding matching position on strings
# 1  
Old 03-23-2011
Lightbulb Help with finding matching position on strings

I have a DNA file like below and I am able to write a short program which finds/not an input motif, but I dont understand how I can include in the code to report which position the motif was found. Example I want to find the first or all "GAT" motifs and want the program to report which position (beginning to end) of the dna motif was found. So in this example GAT (in bold should report 10-12) and if found more than once the other positions too. Can someone explain how to report matching position on strings ??

Thank you very much

Code:
>MY_DNA  
CAGCAGCAAGATTTGCAGCAACAGCAACAAGTAGTGACTACAGTTGCCTCGCAAAGTCCT
CATGCAACTGCAACGGAAAAGGAGCCAGTACCCGCCGTGGTTGACGACCCACTGGAGAAC
ATGTTCGGAGATTATTCCAATGAGCCGTTCAACACCAATTTCGACGATGAATTTGGAGAT

# 2  
Old 03-23-2011
What is your system? What is your shell?

---------- Post updated at 09:24 AM ---------- Previous update was at 09:18 AM ----------

dna.awk:
Code:
{
        LINE++;
        for(N=0; N<(length($0)-length(STR)); N++)
        if(STR == substr($0, N, length(STR)))
                printf("Line %d, position %d\n", LINE, N);
}

Code:
$ awk -f dna.awk -v STR="GAT" dnafile
Line 1, position 10
Line 3, position 10
Line 3, position 46
$

---------- Post updated at 09:35 AM ---------- Previous update was at 09:24 AM ----------

Slight modification to dna.awk:
Code:
{
        LINE++;
        for(N=0; N<(length($0)-length(STR)); N++)
        if(STR == substr($0, N, length(STR)))
                printf("Line %d, %d-%d\n", LINE, N, N+length(STR));
}

# 3  
Old 03-23-2011
I use bash shell

Is there a Perl way to do that coz I'm trying to learn Perl and want to know how to do this with Perl.

Thanks
# 4  
Old 03-23-2011
If you want to treat that those lines as one string:
Code:
perl -ln0e '$x="GAT";s/>.*//;s/\n//g;while(/(.*?)$x/g){print ((length $1) + 1 + $sum . "-" . ((length $&) + $sum));$sum+=length $&}' file

This User Gave Thanks to bartus11 For This Post:
# 5  
Old 03-23-2011
@Bartus11
Here comes again another one of your amazing one-liners ... could you please comment on the code Here are the parts which I think I understand: Correct me if I'm wrong please
$x="GAT" store name of query in $x
s/>.*// substitute > followed by anything on the first line with nothing, meaning ignore first line
s/\n//g remove new lines globally (but doesnt -l option do this automatically??)

The rest of it I didnt understand, please could you explain?

Thank you very much Smilie
# 6  
Old 03-23-2011
-l "chomps" $_ variable, meaning it removes only last newline. To remove newlines that are inside of the input string, s// has to be used.
Code:
while(/(.*?)$x/g){

This is a bit tricky piece. It is matching sequentially on $_ variable, and when match is found, the body of while loop is executed. So for each match in a string, the code of while's body will be executed.
print ((length $1) + 1 + $sumIt will print starting position of the searched pattern. It is achieved by getting the number of characters before that string - length $1 ($1 stores match of the red part in regular expresion). To make it possible to print subsequent positions (after first match), the number of characters from the beginning of the string has to be saved. It is done by $sum.
((length $&) + $sum)This will calculate ending position of pattern to search ($& is a variable containing whole string matched by the regex, so length $& will get number of characters before searched pattern + length of that pattern.
$sum+=length $&As I wrote before, $sum is saving the number of characters that regex already passed through.

Last edited by bartus11; 03-23-2011 at 02:24 PM..
This User Gave Thanks to bartus11 For This Post:
# 7  
Old 03-23-2011
That made it more understandable .... thanks Smilie
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print strings from a particular position in each line

I am using bash in Fedora 30 From the below lines (ls -l output), how can I print whatever is between the strings 'status_' and '.log' $ ls -l | grep -i status -rw-rw-r--. 1 sysadmin sysadmin 378530 Nov 11 21:58 status_vsbm1.log -rw-rw-r--. 1 sysadmin sysadmin 428776 Nov 11 21:58... (8 Replies)
Discussion started by: kraljic
8 Replies

2. UNIX for Dummies Questions & Answers

String pattern matching and position

I am not an expert with linux, but following various posts on this forum, I have been trying to write a script to match pattern of charters occurring together in a file. My file has approximately 200 million characters (upper and lower case), with about 50 characters per line. I have merged all... (5 Replies)
Discussion started by: biowizz
5 Replies

3. Shell Programming and Scripting

awk usage for position matching

i have a requirement like this if the line contains from position 294 to 299 is equal to "prabhu" ,then print entire line . i want to use awk awk '{if(substr(294-299) == 'prabhu') print "line" }' filename (1 Reply)
Discussion started by: ptappeta
1 Replies

4. Shell Programming and Scripting

Finding position of space in a variable

HI All, am trying to find the position of space in a variable, it is working for other characters other than space ulab="ulab1|ulab2" find_pos=`expr index $ulab '|'` echo $find_pos above code worked fine but below one says syntax error ulab="ulab ulab2" find_pos=`expr index $ulab ' '`... (2 Replies)
Discussion started by: ulab
2 Replies

5. Shell Programming and Scripting

Finding relative position in a file

Hi, I have a file like 123 aaaaaaaaa ddddddddd vvvvvvvvv 345 ssssssssssss dddddddddd fffffffffff dddd ff 567 --------- sssssssss ddddddd eeeeeeeee (4 Replies)
Discussion started by: saltysumi
4 Replies

6. Shell Programming and Scripting

Search for multiple strings in specific position

Hi, I need to search for some strings in specific positions in a file. If the strings: "foo1", "foo2" or "foo3" is on position 266 or position 288 in a file i want the whole line printed. Any idea how to do it? (5 Replies)
Discussion started by: HugoH
5 Replies

7. Shell Programming and Scripting

Finding character mismatch position in two strings

Hello, I would like to find an efficient way to compare a pair of strings that differ at one position, and return the difference and position. For example: String1 123456789 String2 123454789 returning something - position 6, 6/4 Thanks in advance, Mike (5 Replies)
Discussion started by: etherite
5 Replies

8. Shell Programming and Scripting

Help in finding the max and min position

Hi, I have this input file called ttbitnres (which is catenated and sorted):- 8 0.4444 213 10 0.5555 342 11 0.5555 321 12 0.5555 231 13 0.4444 400 My code is at :- #!/bin/bash echo -e Version "\t" Number of Pass "\t" Number of Fail "\t" Rank Position "\t"Min "\t" Max... (1 Reply)
Discussion started by: ahjiefreak
1 Replies

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

10. Shell Programming and Scripting

How to insert strings at certain position

Hi, I need to insert strings "0000 00" at the each line within the file. The postion is 37 to 42. ex. name1 name2 0000 00 nam name 0000 00 The "0000 00" in two lines should be lined up. I don't know why it's not lined up when I posted it. Can anyone help? (14 Replies)
Discussion started by: whatisthis
14 Replies
Login or Register to Ask a Question