Simple awk script for positional replacement in text?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Simple awk script for positional replacement in text?
# 8  
Old 08-06-2009
I switched to sed back references or "grouping" since I think this is easier than in awk:
Code:
$> cat infile
2011AAAATAACAAAAAT
2012AAATTAACAAAAAT
2013AAATTAAGAAAAAT
$> sed 's/\(....\).\(.\)..\(.\).\(.*\)/\1X\2xy\3z\4/' infile
2011XAxyTzACAAAAAT
2012XAxyTzACAAAAAT
2013XAxyTzAGAAAAAT
$> cat want
2011XAxyTzACAAAAAT
2012XAxyTzACAAAAAT
2013XAxyTzAGAAAAAT

I group those parts I want to keep unchanged inside escaped brackets \( and \). I wrote every character as dot. I could have written four characters as \{4\} for example but it's not worth compared to type 4 dots.
I made 4 groups that stay unchanged and so are just printed. Between them I inserted the characters you wanted. This is still position related but you have to type a number of dots or sum them up with \{6\} for 6 dots for example.
# 9  
Old 08-06-2009
Thank you / will this work with long files?

Thanks! I was getting the sense I might need to do a sort of patchwork using SED or PERL (neither of which I have ever used, but between your suggestion and a cheat sheet, it ought to be doable).

However, each of my "lines" is actually a DNA sequence several hundred characters long. (As opposed to the short version used in the example--I should have mentioned that up-front, I suppose.)

Will it work to stick 40 of those long lines in the command? Or can I use the command to do the process to "each line in the file"? Or will having ~550 character lines be a problem, regardless?

I have a sort of offer from the spouse to write something in FORTRAN sooner or later if SED, AWK etc can't handle the task, but I'm hoping I can make this work myself, and from what I can remember, this should be possible.
# 10  
Old 08-07-2009
awk and sed will work every line of your given input file even if it's several thousands or more; you can just try it out.
You can write a very long line that has all your needed changes in it for every position. It is just more work for you but awk or sed would not mind I think. Have to try it out.
If you want to get deeper there is a really good book about awk and sed if your interessted on using them:
sed & awk | O'Reilly Media
# 11  
Old 08-20-2009
I'm back at this after some bench work, and it's not working for me yet.

here's a file named infile

Code:
>zzzzzzzzzzzzzGGGAAGTGAGGCGYTGTTGTTATTTGGTTTAYGAGTCAGAGGTGTTTTTTCACGGAGAGATGGCTCTAAGA
>zzzzzzzzzzzzzGGGAAGTGAGGCGYTGTTGTTATTTGGTTTAYGAGYTAGAGGYGTCTTTTTACGGAGAGACGGTTTTAAGA

I also tried a version without the initial ">" symbol, in case that had an inappropriate meaning.

here are three main versions of the sed script I'm trying to run, along with variants. I'm using the g command rather than the individual replacements because the replacement value will always be the same. Two of the versions are an attempt to figure out what sort of brackets to use when using numbers rather than dots. The final, clunky version uses the dots.

version 1
Code:
$> sed 's/\{36\}.\{12\}.\{9\}.\(.\).\(..\).\{12\}.\(.\).\(.*\)/x/g' infile1

version 1a
Code:
$> sed -e 's/\{36\}.\{12\}.\{9\}.\(.\).\(..\).\{12\}.\(.\).\(.*\)/x/g' infile1

version 1b
Code:
$> sed 's/\{36\}.\{12\}.\{9\}.\(.\).\(..\).\{12\}.\(.\).\(.*\)/x/g' infile >zzzzzzzzzzzzzGGGAAGTGAGGCGYTGTTGTTATTTGGTTTAYGAGTCAGAGGTGTTTTTTCACGGAGAGATGGCTCTAAGA
>zzzzzzzzzzzzzGGGAAGTGAGGCGYTGTTGTTATTTGGTTTAYGAGYTAGAGGYGTCTTTTTACGGAGAGACGGTTTTAAGA

version 2
Code:
$> sed 's/\(36\).\(12\).\(9\).\(.\).\(..\).\(12\).\(.\).\(.*\)/x/g' infile1

(same variants as with version 1)

version 3
Code:
$> sed 's/\(....................................\).\(............\).\(.........\).\(.\).\(..\).\(............\).\(.\).(.*\)/x/g'

(same variants as with version 1)

the resulting error is sometimes along the lines of command not found, other times the error says unbalanced parentheses.

desired output would be

Code:
>zzzzzzzzzzzzzGGGAAGTGAGGCGYTGTTGTTAxTTGGTTTAYGAGxCAGAGGTGTxTxTTxACGGAGAGATGGxTxTAAGA
>zzzzzzzzzzzzzGGGAAGTGAGGCGYTGTTGTTATTTGGTTTAYGAGYTAGAGGYGTxTxTTxACGGAGAGACGGxTxTAAGA

to give you a sense of the magnitude of the real program, here's the proposed script for the entire piece of DNA (which would be applied to about 60 lines of that length, rather than the two lines in the infile example.

Code:
$> sed 's/\(36\).\(12\).\(9\).\(.\).\(..\).\(12\).\(.\).\(13\).\(6\).....\(5\).\(.\)..\(7\).....\(.\).\(6\).\(..\).\(6\).\(...\)...\(25\).\(18\).\(..\)..\(8\).\(...\).\(...\).\(...\).\(5\).\(6\).\(9\).\(10\)..\(21\).\(6\)..\(8\).\(10\).\(4\).\(10\).\(24\).\(26\).\(.*\)/x/g' infile

I'm using a MacOS X default (Bash shell) terminal.

Is there anything I am doing right?
Is there something dead obvious that I'm doing wrong?
I need to write, oh, 18 scripts of this length, and get them working.

---------- Post updated at 03:16 PM ---------- Previous update was at 03:14 PM ----------

I missed two replacements in the sample output file

Code:
>zzzzzzzzzzzzzGGGAAGTGAGGCGYTGTTGTTAxTTGGTTTAYGAGxCAGAGGTGTxTxTTxACGGAGAGATGGxTxTAAGA
>zzzzzzzzzzzzzGGGAAGTGAGGCGYTGTTGTTAxTTGGTTTAYGAGxTAGAGGYGTxTxTTxACGGAGAGACGGxTxTAAGA

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multiple Replacement in a Text File in one operation (sed/awk) ?

Hi all, Saying we have two files: 1. A "Reference File" whose content is "Variable Name": "Variable Value" 2. A "Model File" whose content is a model program in which I want to substitute "VariableName" with their respective value to produce a third file "Program File" which would be a... (4 Replies)
Discussion started by: dae
4 Replies

2. Shell Programming and Scripting

Text replacement with awk or sed?

Hi guys, I worked for almost a half-day for the replacement of some text automatically with script. But no success. The problem is I have hundred of files, which need to be replaced with some new text. It's a painful work to work manually and it's so easy to do it wrong. For example, I... (2 Replies)
Discussion started by: liuzhencc
2 Replies

3. Programming

Resample time series data with replacement any way to do this in awk or just bash script

I have some time series data that I need to resample or downsample at some specific time intervals. The firs column is time in decimal hours. I am tryiong to resample this data every 3 minutse. So I need a data value ever 0.05. Here is the example data and as you can see, there time slot for 0.1500... (3 Replies)
Discussion started by: malandisa
3 Replies

4. UNIX for Dummies Questions & Answers

Simple script to write new lines in a text file

Hello, I have a comma seperated data sheet with multiple fields of biological data. One column contains the ID name of the sample, where there could be more than one sample separated by a comma. I would like a script that reads this field, and for each sample ID, copies the entire line and writes... (18 Replies)
Discussion started by: torchij
18 Replies

5. UNIX for Dummies Questions & Answers

Simple version control script for text files

HI guys, Could you help me writing a simple version control script for a text files. the format could be ./version_control <file(s)> (I want it to be able to work with more than 1 file at the same time) commands are add and get, add means you add new file(s) to the archive, get means you... (4 Replies)
Discussion started by: s3270226
4 Replies

6. UNIX for Dummies Questions & Answers

Question on how to manipulate a SIMPLE text file (using awk?)

I have a simple txt files that looks something like this (The title is a part of the text file) Student Grades --------------- 1 Tim Purser 89 2 John Wayne 56 3 Jenn Hawkins 95 4 Harry Potter 75 Here are my questions: How would I ONLY print the names of students... (2 Replies)
Discussion started by: ninjagod123
2 Replies

7. Shell Programming and Scripting

Define Positional Parameter Range Awk

Hello All, I am trying to clean up a poorly looking awk command. I am searching for a way to define a range of positional parameters. I may not be searching for the correct syntax. Example: awk ' /14:3*/ {print $2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13}' app.log Is it possible to shorten... (4 Replies)
Discussion started by: jaysunn
4 Replies

8. Shell Programming and Scripting

Simple 'sed' script for replacing text

Hi All, So I found a simple sed command to replace text in a file (http://www.labnol.org/internet/design/wordpress-unix-replace-text-multiple-files/1128/): sed -e 's/OLDtext/NEWtext/' -i file(s) Because I'm lazy and don't want to remember this each time I want to do this, I wrote the following... (4 Replies)
Discussion started by: ScKaSx
4 Replies

9. Shell Programming and Scripting

perl as awk replacement in a script.

Hey all, Im trying to write a script on windows, which Im not too familiar with. Im generally a bash scripting guy but am using perl for this case. My question is... I have this exact output: 2 Dir(s) 6,380,429,312 bytes free and I just need to get the number out... (4 Replies)
Discussion started by: trey85stang
4 Replies

10. UNIX for Dummies Questions & Answers

Awk/Sed One liner for text replacement

Hi group, I want to replace the occurance of a particular text in a paragraph.I tried with Sed,but Sed only displays the result on the screen.How can i update the changes in the original file??? The solution should be a one liner using awk and sed. Thanks in advance. (5 Replies)
Discussion started by: bishnu.bhatta
5 Replies
Login or Register to Ask a Question