Simple awk script for positional replacement in text?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Simple awk script for positional replacement in text?
# 1  
Old 08-05-2009
Simple awk script for positional replacement in text?

I have a string of letters. (They happen to be DNA, not that it's relevant to the question.) For analysis purposes, I need to replace the information at some of the sites. I need to do this based on their position, not the information in that position.

I also need to ignore differences at other sites, so I can't do a simple "wild card search and replace." I remember using awk scripts to do something like this about a decade ago, but I don't remember how.

For example, if you wanted to take the following strings,

AAAAATAAAGAAAA
and
AAAATTAAAGAAAA
and
AAAAATAAACAAAA

and in each case, turn the 10th letter into an "N," without changing any of the other letters, how would you do that? (The actual files are about 500 characters long, really only one "field" and will have about 30 position-specific replacements. I'll probably deal with each line separately, come to think of it.)

I'm also willing to use a web tool or text editor....MacOS or Unix terminal on a Mac. (Last time I had to do something like this was 1998.)

I know the forum rules say to use standard notation for programs, but if I could read standard notation with no further explanation, I probably would not need to be posting this question. So text-y replies would be most welcome. Thanks!
# 2  
Old 08-05-2009
To keep the forums high quality for all users, please take the time to format your posts correctly.

First of all, use Code Tags when you post any code or data samples so others can easily read your code. You can easily do this by highlighting your code and then clicking on the # in the editing menu. (You can also type code tags [code] and [/code] by hand.)

Second, avoid adding color or different fonts and font size to your posts. Selective use of color to highlight a single word or phrase can be useful at times, but using color, in general, makes the forums harder to read, especially bright colors like red.

Third, be careful when you cut-and-paste, edit any odd characters and make sure all links are working property.

Thank You.

The UNIX and Linux Forums

**************************************************************

If I got it right - changing the 10th character from G to N for example:
Code:
echo "AAAAATAAAGAAAA"| awk '{substr((sub(/G/,"N")),10,1); print}'


Last edited by zaxxon; 08-05-2009 at 02:49 AM.. Reason: Added possible solution
# 3  
Old 08-05-2009
clarification of question

That 10th position that's being replaced need not be a G. Strictly speaking, it could be anything. That's why I show it as being either a G or a C in the three examples I put up. Unfortunately, G's and C's look similar. Maybe I should have made that a "C or T" or a "G or A" (those are the actual options for what I'm doing).

Also, the data at OTHER points in each string is ALSO not identical.

If I remember correctly, I can put two options in brackets at that location, and otherwise follow what you're suggesting. Or whatever the symbol for a single character wild card is, which I should be able to look up. I remember that it's not just an asterisk (*).

Would this work?

Code:
echo "AAAA[AT]TAAA[ACGT]AAAA"| awk '{substr((sub(/[ACGT]/,"N")),10,1); print}'



Thanks!

NB, I also don't understand the comment about code tags, because so far as I know, I have not included any code, and my example is not "data," it's an explanation of my terminology, in case I'm using my terms incorrectly. Perhaps "data" means something different to a computer scientist than it does to a biologist. Apologies, if so.

Last edited by JFS; 08-05-2009 at 09:26 AM.. Reason: clarification / request for clarification of answer
# 4  
Old 08-05-2009
Code:
echo "AAAAATAAAGAAAA"| awk '{sub((substr($0,10,1)),"N"); print}'

This up there should work whatever the 10th char is.

CODE tags are not only useful for code. They are also very useful to display anything but normal writings. It enhances readability a lot. If there is a lot text above and a lot text below describing things and inbetween there is a snippet of an example of data etc. then it will be much easier to be distinguished from the other text. This is a recommendation for all users on the board no matter if it is code, data or logs. I am and I bet other Mods too are very tired (not because of you Smilie ) to always write a lengthy comment when I edit posts adding CODE tags etc. so Neo provided for us Mods this canned reply up there.
# 5  
Old 08-05-2009
Thanks! That looks like the truly position-only answer I was looking for. Much nicer than having to specify a boatload of sequence alternatives.
# 6  
Old 08-05-2009
I guess you answered while I was still adding text - read it! Smilie Smilie
# 7  
Old 08-05-2009
trying to do multiple replacements; not working

I got the single replacement to work. But when I try to chain up multiple commands, it's hit or miss.

Here's my input file, creatively named testfile1

Code:
2011AAAATAACAAAAAT
2012AAATTAACAAAAAT
2013AAATTAAGAAAAAT

here's the script

Code:
awk '{sub((substr($0,5,1)),"X");sub((substr($0,7,1)),"x");sub((substr($0,8,1)),"Y");sub((substr($0,10,1)),"z");print}' testfile1 > testfile6

and here's the output I get

Code:
2011XxYzTAACAAAAAT
2012XxzYTAACAAAAAT
2013XxzYTAAGAAAAAT

as opposed to what I hoped to get (and thought I asked for), which is

Code:
2011XAxyTzACAAAAAT
2012XAxyTzACAAAAAT
2013XAxyTzAGAAAAAT

is this a "first instance of a character" issue? I'm using the "terminal" function on a Mac, which I somehow believe is a BASH shell.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multiple Replacement in a Text File in one operation (sed/awk) ?

Hi all, Saying we have two files: 1. A "Reference File" whose content is "Variable Name": "Variable Value" 2. A "Model File" whose content is a model program in which I want to substitute "VariableName" with their respective value to produce a third file "Program File" which would be a... (4 Replies)
Discussion started by: dae
4 Replies

2. Shell Programming and Scripting

Text replacement with awk or sed?

Hi guys, I worked for almost a half-day for the replacement of some text automatically with script. But no success. The problem is I have hundred of files, which need to be replaced with some new text. It's a painful work to work manually and it's so easy to do it wrong. For example, I... (2 Replies)
Discussion started by: liuzhencc
2 Replies

3. Programming

Resample time series data with replacement any way to do this in awk or just bash script

I have some time series data that I need to resample or downsample at some specific time intervals. The firs column is time in decimal hours. I am tryiong to resample this data every 3 minutse. So I need a data value ever 0.05. Here is the example data and as you can see, there time slot for 0.1500... (3 Replies)
Discussion started by: malandisa
3 Replies

4. UNIX for Dummies Questions & Answers

Simple script to write new lines in a text file

Hello, I have a comma seperated data sheet with multiple fields of biological data. One column contains the ID name of the sample, where there could be more than one sample separated by a comma. I would like a script that reads this field, and for each sample ID, copies the entire line and writes... (18 Replies)
Discussion started by: torchij
18 Replies

5. UNIX for Dummies Questions & Answers

Simple version control script for text files

HI guys, Could you help me writing a simple version control script for a text files. the format could be ./version_control <file(s)> (I want it to be able to work with more than 1 file at the same time) commands are add and get, add means you add new file(s) to the archive, get means you... (4 Replies)
Discussion started by: s3270226
4 Replies

6. UNIX for Dummies Questions & Answers

Question on how to manipulate a SIMPLE text file (using awk?)

I have a simple txt files that looks something like this (The title is a part of the text file) Student Grades --------------- 1 Tim Purser 89 2 John Wayne 56 3 Jenn Hawkins 95 4 Harry Potter 75 Here are my questions: How would I ONLY print the names of students... (2 Replies)
Discussion started by: ninjagod123
2 Replies

7. Shell Programming and Scripting

Define Positional Parameter Range Awk

Hello All, I am trying to clean up a poorly looking awk command. I am searching for a way to define a range of positional parameters. I may not be searching for the correct syntax. Example: awk ' /14:3*/ {print $2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13}' app.log Is it possible to shorten... (4 Replies)
Discussion started by: jaysunn
4 Replies

8. Shell Programming and Scripting

Simple 'sed' script for replacing text

Hi All, So I found a simple sed command to replace text in a file (http://www.labnol.org/internet/design/wordpress-unix-replace-text-multiple-files/1128/): sed -e 's/OLDtext/NEWtext/' -i file(s) Because I'm lazy and don't want to remember this each time I want to do this, I wrote the following... (4 Replies)
Discussion started by: ScKaSx
4 Replies

9. Shell Programming and Scripting

perl as awk replacement in a script.

Hey all, Im trying to write a script on windows, which Im not too familiar with. Im generally a bash scripting guy but am using perl for this case. My question is... I have this exact output: 2 Dir(s) 6,380,429,312 bytes free and I just need to get the number out... (4 Replies)
Discussion started by: trey85stang
4 Replies

10. UNIX for Dummies Questions & Answers

Awk/Sed One liner for text replacement

Hi group, I want to replace the occurance of a particular text in a paragraph.I tried with Sed,but Sed only displays the result on the screen.How can i update the changes in the original file??? The solution should be a one liner using awk and sed. Thanks in advance. (5 Replies)
Discussion started by: bishnu.bhatta
5 Replies
Login or Register to Ask a Question