Search and replace with a sliding window


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Search and replace with a sliding window
# 1  
Old 06-27-2014
Search and replace with a sliding window

Hi Unix Gurus,

I have a file with data like:
Code:

>header_1
TCCCCGA
>header_2
CCAATTGGGTA

The data to work with starts from the next line after '>header_xx'.
(1)
I want to search the three letter patterns 'CHH' or 'DDG' and replace C and G by exclamation ! so that CHH becomes !HH and DDG becomes DD!.
where: H = any letter but not G (example CHH = CAA, CAT, CTT etc) and D = any letter but not C (example DGG = ATG, AAG etc)
(2)Form a 3 letter window and slide 1 letter at a time; check for the patten described above until finish replacing.
Example output:
Code:
>header_1
T!!CCGA
>header_2
!!AATT!!!TA

Thanks a lot for your help.
# 2  
Old 06-27-2014
Try this (yields the desired result without formally forming a 3 letter window):
Code:
awk     '       {while (gsub(/C[^G][^G]/, "@&")) gsub(/@C/,"!")}
                {while (gsub(/[^C][^C]G/, "&@")) gsub(/G@/,"!")}
         1
        ' file
>header_1
T!!CCGA
>header_2
!!AATT!!!TA

# 3  
Old 06-27-2014
Quote:
Originally Posted by Fahmida
(2)Form a 3 letter window and slide 1 letter at a time; check for the patten described above until finish replacing.
To be honest, i have troubles understanding the exact meaning of this. I suppose you want to work on different reading frames.

Because there are only 3 of them (using "T" as any triplett and "B" for any single base there is only "TTT...", "BTTT..." and "BBTTT...") you will not need a sliding window.

Second, because of your first replacements a pattern could match where it hasn't matched before in subsequent passes. Suppose your line is:

Code:
DDGG

Because it will not match "DDG" the triplett 2-4 would not be replaced. Once triplett 1-3 would be replaced, it would match, though:

Code:
DDGG        # replace "DDG" with "DD!"
DD!G        # after first replacement, now prelace "2 not-Gs, then G" again
DD!!        # after second pass

Analogous for your other replacement rules. Somehow i doubt that this is what you really want. Please clarify.

I hope this helps.

bakunin
# 4  
Old 06-27-2014
Hi bakunin
thanks for taking the time to go through it and providing the example.
you are right:
DD!! # after second passis exactly what I need.
This is irrespective of the codon, reading frame etc and to do with replacing the C's in the CHH methylation context.
So it'll be really helpful if a piece of code that will be able to it in multiple pass as you explained.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Xlib search window by his name

Hello, I just try to get the control of a Window by searching his name. I curretly do that : Window CMD::window_from_name_search(Display *display, Window current, char const *needle) { Window retval, root, parent, *children; unsigned children_count; char *name = NULL; ... (0 Replies)
Discussion started by: Loustak
0 Replies

2. Shell Programming and Scripting

Nested search in a file and replace the inner search

Hi Team, I am new to unix, please help me in this. I have a file named properties. The content of the file is : ##Mobile props east.url=https://qa.east.corp.com/prop/end west.url=https://qa.west.corp.com/prop/end south.url=https://qa.south.corp.com/prop/end... (2 Replies)
Discussion started by: tolearn
2 Replies

3. Shell Programming and Scripting

How do add values in a vector using a sliding window?

Greetings. I have a vector of numbers such as the following: 1 75 79 90 91 92 109 120 167 198 203 204 206 224 230 236 240 (4 Replies)
Discussion started by: Twinklefingers
4 Replies

4. Shell Programming and Scripting

Sliding window for string manipulation

I have a sting of "0"s and "1"s that I need to analyze. I need to look at each "1" and determine if it is in a neighborhood that is enriched for "1"s which means it is one of at least three "1"s in a 4 character window. My desired output is a count of "1"s in an enriched area. For Example Input... (1 Reply)
Discussion started by: monstrousturtle
1 Replies

5. UNIX for Dummies Questions & Answers

"Sliding window" with variables

I'm doing a little work that involves computing the average completion time of the last 5 of many file decompressions. It's not too tough, but I'm wondering if maybe there's a better way to write it. This is a bash script; here's the current idea: ctime5=$ctime4 ctime4=$ctime3 ctime3=$ctime2... (2 Replies)
Discussion started by: treesloth
2 Replies

6. UNIX for Dummies Questions & Answers

Sliding window

Very simple problem I am not able to solve. I have been trying to modify the following code: awk '{t=$1; c = x}{for (i = 1; i <= length; i += wn)print t FS"" substr($2, i, mx) > ("block" ++c)}' mx=100 wn=100 infile.txt What I am tryng to acccomplish, I have a bunch of files where the first... (3 Replies)
Discussion started by: Xterra
3 Replies

7. Shell Programming and Scripting

Sliding window for sequencing data

Hi! I have some sequencing data that I have aligned using maq software Now, I have data that looks like this each line is a 'tag' chr1 10001 chr1 10002 chr1 10005 chr1 10007 chr1 10008 chr1 10008 chr1 10008 chr1 10019 chr1 10019 chr1 10020 What I really want to find out is how... (1 Reply)
Discussion started by: biobio
1 Replies

8. Shell Programming and Scripting

Script help: sliding time windows

I have a script like this ... #!/bin/ksh database=$(echo $@ | sed 's/.*-S \(*\).*/\1/') instance=$(grep $database /var/opt/oracle/oratab | awk -F : '{print $1}') command=$(echo $@ | sed "s/$database/$instance/") echo $command if I execute this script in ksh or bash it works fine . ... (3 Replies)
Discussion started by: talashil
3 Replies

9. Shell Programming and Scripting

awk - replace number of string length from search and replace for a serialized array

Hello, I really would appreciate some help with a bash script for some string manipulation on an SQL dump: I'd like to be able to rename "sites/WHATEVER/files" to "sites/SOMETHINGELSE/files" within the sql dump. This is quite easy with sed: sed -e... (1 Reply)
Discussion started by: otrotipo
1 Replies

10. Shell Programming and Scripting

Perl: Search for string on line then search and replace text

Hi All, I have a file that I need to be able to find a pattern match on a line, search that line for a text pattern, and replace that text. An example of 4 lines in my file is: 1. MatchText_randomNumberOfText moreData ReplaceMe moreData 2. MatchText_randomNumberOfText moreData moreData... (4 Replies)
Discussion started by: Crypto
4 Replies
Login or Register to Ask a Question