Search for a particular word and replace the first character


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Search for a particular word and replace the first character
# 1  
Old 06-05-2014
Search for a particular word and replace the first character

Hi Unix gurus,

I've a dna sequence in a file format known as fasta format (sequence header starts with > and ignored), an example shown below:
Code:
>sequence_1
CGTATTCTCCGAATACC
ATACG
>sequence_2
CAGATTTTCAAATACCCCC

In a file like this I want to do the following three search and replace. The original file becomes three different files based on each search condition.
1) search two character pattern CG and replace that C by the word NULL
2) search three character pattern CXG and replace that C by the word NULL (where X=A,C or T but not G)
3) search three character pattern CXX and replace that C by the word NULL (where X=A,C or T but not G)

Example output based on above file:
File_CG_replace
Code:
>sequence_1
NULLGTATTCTCNULLGAATACC
ATANULLG
>sequence_2
CAGATTTTCAAATACCCCC

File_CXG_replace
Code:
>sequence_1
CGTATTCTNULLCGAATACC
ATACG
>sequence_2
CAGATTTTCAAATACCCCC

File_CXX_replace
Code:
>sequence_1
CGTATTNULLTCNULLGAATANULLC
ATACG
>sequence_2
CAGATTTTNULLAAATANULLCCCC

Thanks for your help.
# 2  
Old 06-05-2014
How come your second output is not this?

Code:
>sequence_1
CGTATTCTNULLCGAATACC
ATACG
>sequence_2
NULLAGATTTTCAAATACCCCC

And for your third output, why is CGA replaced? (In the 2nd NULL on the first sequence 1 line):
Code:
>sequence_1
CGTATTNULLTCNULLGAATANULLC
ATACG
>sequence_2
CAGATTTTNULLAAATANULLCCCC

# 3  
Old 06-05-2014
Seconding pilnet101's comments, I came up with:
Code:
awk     '{OA=OB=OC=$0
          gsub(/CG/, "NULLG", OA); print OA > "File_CG_replace"
          gsub(/C[^G]G/, "@&", OB); gsub (/@C/,"NULL", OB); print OB > "File_CXG_replace"
          gsub(/C[^G][^G]/, "@&", OC); gsub (/@C/,"NULL", OC); print OC > "File_CXX_replace"}
        ' file

File_CG_replace:
>sequence_1
NULLGTATTCTCNULLGAATACC
ATANULLG
>sequence_2
CAGATTTTCAAATACCCCC
File_CXG_replace:
>sequence_1
CGTATTCTNULLCGAATACC
ATACG
>sequence_2
NULLAGATTTTCAAATACCCCC
File_CXX_replace:
>sequence_1
CGTATTNULLTCCGAATACC
ATACG
>sequence_2
CAGATTTTNULLAAATANULLCCCC

These 2 Users Gave Thanks to RudiC For This Post:
# 4  
Old 06-05-2014
@RudiC, nice use of the "@" placeholder to circumvent the lack of back reference in awk.....

It could be further reduced like this:
Code:
awk '
  BEGIN{
    A["CG"]="CG"
    A["CXG"]="C[ACT]G"
    A["CXX"]="C[ACT][ACT]"
  }
  {
    for(p in A){
      s=$0
      gsub(A[p],"@&",s)
      gsub("@C","NULL",s)
      print s >("FILE_" p "_REPLACE")
    }
  }
' file

This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 06-05-2014
Both working fine! Thanks a lot!
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Post Here to Contact Site Administrators and Moderators

Search for a pattern and replace a space at specific position with a Character in File

In file, we have millions of records each of 1000 in length. And at specific position say 800 there is a space, we need to replace it with Character X if the ID in that row starts with 123. So far i have used the below which is replacing space at that position to X but its not checking for... (3 Replies)
Discussion started by: Jagmeet Singh
3 Replies

2. Shell Programming and Scripting

Search and replace the string with new word using xml tags

Hi All i need to replace the url1 inside <remote> tag in below xml in first instance and in the second instance with url2. any help appreciated <locations> <hudson.scm.SubversionSCM_-ModuleLocation> <remote>https://svn2015.com/svn/repos/internalshard</remote> ... (4 Replies)
Discussion started by: madankumar.t@hp
4 Replies

3. Shell Programming and Scripting

Search a character and replace it with multiple lines

This is for AIX 6.1, I've a flat file and the format is like this DECLARE some statements; BEGIN some statements; END; I've to search BEGIN and replace it with the following 4 lines BEGIN For x in 1..1 LOOP BEGIN Similarly I've to search END and replace it with the... (7 Replies)
Discussion started by: Mukul Sharma
7 Replies

4. Shell Programming and Scripting

Regex:search/replace but not for escaped character

Hi Input: - -- --- ---- aa-bb-cc aa--bb--cc aa---bb---cc aa----bb----cc Output: . - -. -- aa.bb.cc (7 Replies)
Discussion started by: chitech
7 Replies

5. Shell Programming and Scripting

Search for word in a xml file and replace it with something else

Hello Unix Users, I am very new to Unix so I am not sure how do I do the following. I need a script such that when I type the following in the command prompt > . scriptName.sh wordToBeReplaced DirectoryLocation will find the word someword located in a somefile.xml in DirectoryLocation... (8 Replies)
Discussion started by: 5211171
8 Replies

6. Shell Programming and Scripting

Interesting question - Search and replace the word after sign "="

Hi Guys, Req your help in searching and replacing the word that comes after equals(=) symbol I would like to replace the sting in bold with a string in variable. d=ABCDF8C44C22 # grep -i NIM_MASTERID ${_NIMINFO} export NIM_MASTERID=00CDF8C44C00 I'm looking to replace any word that... (4 Replies)
Discussion started by: ajilesh
4 Replies

7. UNIX for Dummies Questions & Answers

vi Search for text, Replace with <CR> or control character.

Greetings, Using vi, how can I change the following text: -I/myviews/nexus_7400rel/vobs/nexus/platforms/97400/include -I/myviews/nexus_7400rel/vobs/nexus/modules/i2c/7400/include -I/myviews/nexus_7400rel/vobs/nexus/modules/surface/7400/include Into this:... (4 Replies)
Discussion started by: omega949
4 Replies

8. Shell Programming and Scripting

Need to search and replace based on character count

Hi, I wanted to add a newline character after every 100 characters in a file using a awk or shell without reading each line of the file. I want to run a command on the complete file. This does based on a string but i want to add a new line after every 100 characters ir-respective of the... (3 Replies)
Discussion started by: vijaykrc
3 Replies

9. Shell Programming and Scripting

sed search and replace word assistance...

Hi, I am trying to write a shell script designed to take input line by line by line from a file with a word on each line for editing with sed. Example file: 1.ejverything 2.bllown 3.maikling 4.manegement 5.existjing 6.systems My design currently takes input from the user, and... (2 Replies)
Discussion started by: mkfitzwilliams
2 Replies

10. UNIX for Dummies Questions & Answers

to search for a particular character in a word

Hi I would like to accept in a string from user like username/pwd@dbname suppose the user does not input @ then i should throw an error that @ symbol missing . How to achieve this Thanks in advance Suresh (6 Replies)
Discussion started by: ssuresh1999
6 Replies
Login or Register to Ask a Question