Delete specific strings in a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Delete specific strings in a file
# 1  
Old 01-10-2015
Delete specific strings in a file

Hi,
My file has a numerous sttrings.I want to retain those strings which start with stt and delete entries with >C
For eg:
my infile is

Code:
>C4603985
ATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHATCACCTGCGATATGGGGGCCACCATAGATAGTAAAGATGGTTCTTACCFAGATGFFGTGFAGGHTGHTAHGHATAGGTFAGATGFFGTGFAGGHTGHTAHGHATAGGTFAGATGFFGTGFAGGHTGH>C3914137
CATCTTGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHTTATTAGGGAAGAAACCTTCATTCTCTTTTTAATTTTCTTTTTCAAGATGCAATTGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHCTGTATCACCAGTTATATATAGATTGCATCTTGATATAGATATAGTG
>stt4468869
SQAERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHAHAHGHACTHTGHTGGHCAHAANGGGATATAACCAGGTGGAAGGTAGACTACTTTGATAGTTTTTCTCCAGTNGNGCANAAGTTAGTAACAGTGAGAATTTTCTTAGCAATCGCAG
>C4913369
GTERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTTGCVTCABGCGBATBAAGATBTBTHYHBHHTAGCGCCCTACGTACGATGCAGTAAATTCCGTAACGTAGTGGTTGAGATATCTCAGTAAGCTTCACGTTGACGTTAGTTBTBTAATCGCCCGTGATGDDDGTGTTAGAATGTTTAAAGGATATACAT
>stt4688097
GCDDDDCJJJJAWDDDDFFFFFCTCATATCGCAGGTGATTCCAACAGATCGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAGGACCTGCGAAAAGGGCAAAACCGACCTCGGACGAGATTGTGTTTACGGACGAGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAG

My outfile should look like

Code:
>stt4468869
SQAERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHAHAHGHACTHTGHTGGHCAHAANGGGATATAACCAGGTGGAAGGTAGACTACTTTGATAGTTTTTCTCCAGTNGNGCANAAGTTAGTAACAGTGAGAATTTTCTTAGCAATCGCAG
>stt4688097
GCDDDDCJJJJAWDDDDFFFFFCTCATATCGCAGGTGATTCCAACAGATCGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAGGACCTGCGAAAAGGGCAAAACCGACCTCGGACGAGATTGTGTTTACGGACGAGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAG


Any suggestions?Smilie
# 2  
Old 01-10-2015
Try

Code:
 
 $ cat  tmp
 >C4603985
 ATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHATCACCTGCGATATGGGGGCCACCATAGATAGTAAAGATGGTTCTTACCFAGATGFFGTGFAGGHTGHTAHGHATAGGTFAGATGFFGTGFAGGHTGHTAHGHATAGGTFAGATGFFGTGFAGGHTGH>C3914137
 CATCTTGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHTTATTAGGGAAGAAACCTTCATTCTCTTTTTAATTTTCTTTTTCAAGATGCAATTGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHCTGTATCACCAGTTATATATAGATTGCATCTTGATATAGATATAGTG
 >stt4468869
 SQAERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHAHAHGHACTHTGHTGGHCAHAANGGGATATAACCAGGTGGAAGGTAGACTACTTTGATAGTTTTTCTCCAGTNGNGCANAAGTTAGTAACAGTGAGAATTTTCTTAGCAATCGCAG
 >C4913369
 GTERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTTGCVTCABGCGBATBAAGATBTBTHYHBHHTAGCGCCCTACGTACGATGCAGTAAATTCCGTAACGTAGTGGTTGAGATATCTCAGTAAGCTTCACGTTGACGTTAGTTBTBTAATCGCCCGTGATGDDDGTGTTAGAATGTTTAAAGGATATACAT
 >stt4688097
 GCDDDDCJJJJAWDDDDFFFFFCTCATATCGCAGGTGATTCCAACAGATCGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAGGACCTGCGAAAAGGGCAAAACCGACCTCGGACGAGATTGTGTTTACGGACGAGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAG
  
 $ awk '{ if ( $0~/>st/ ) {
 > a=1
 > }
 > if ( $0 ~/>C/ ) {
 > a=0}
 > if ( a==1)
 > print $0
 > }' tmp
 >stt4468869
 SQAERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHAHAHGHACTHTGHTGGHCAHAANGGGATATAACCAGGTGGAAGGTAGACTACTTTGATAGTTTTTCTCCAGTNGNGCANAAGTTAGTAACAGTGAGAATTTTCTTAGCAATCGCAG
 >stt4688097
 GCDDDDCJJJJAWDDDDFFFFFCTCATATCGCAGGTGATTCCAACAGATCGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAGGACCTGCGAAAAGGGCAAAACCGACCTCGGACGAGATTGTGTTTACGGACGAGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAG

This User Gave Thanks to senhia83 For This Post:
# 3  
Old 01-10-2015
senhia83's script can be simplified a little bit to something like:
Code:
awk '
/^>/ { copy = ($0 ~ /^>stt/) }
copy' infile

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 01-10-2015
You can try:

Code:
awk 'NF&&!/^C/{print ">"$0}' ORS="" RS=">"

This User Gave Thanks to pilnet101 For This Post:
# 5  
Old 01-11-2015
Hello sa@@,

One more approach may help you, very slightly different from Don's and Senhia83's script. Enjoy learning Smilie
Code:
awk '/^>stt/ {A=1} A{print} !/^>stt/ {A=0}' Input_file

Output will be as follows.
Code:
>stt4468869
SQAERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHAHAHGHACTHTGHTGGHCAHAANGGGATATAACCAGGTGGAAGGTAGACTACTTTGATAGTTTTTCTCCAGTNGNGCANAAGTTAGTAACAGTGAGAATTTTCTTAGCAATCGCAG
>stt4688097
GCDDDDCJJJJAWDDDDFFFFFCTCATATCGCAGGTGATTCCAACAGATCGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAGGACCTGCGAAAAGGGCAAAACCGACCTCGGACGAGATTGTGTTTACGGACGAGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAG

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 01-11-2015
@sa@@: your sample contains a multiline sequence

Shouldn't the input file be:
Code:
>C4603985
ATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHATCACCTGCGATATGGGGGCCACCATAGATAGTAAAGATGGTTCTTACCFAGATGFFGTGFAGGHTGHTAHGHATAGGTFAGATGFFGTGFAGGHTGHTAHGHATAGGTFAGATGFFGTGFAGGHTGH
>C3914137
CATCTTGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHTTATTAGGGAAGAAACCTTCATTCTCTTTTTAATTTTCTTTTTCAAGATGCAATTGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHCTGTATCACCAGTTATATATAGATTGCATCTTGATATAGATATAGTG
>stt4468869
SQAERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTHTTHAHAGHGCHAHAHGHACTHTGHTGGHCAHAANGGGATATAACCAGGTGGAAGGTAGACTACTTTGATAGTTTTTCTCCAGTNGNGCANAAGTTAGTAACAGTGAGAATTTTCTTAGCAATCGCAG
>C4913369
GTERTYUVSAASSFFAGATGFFGTGFAGGHTGHTAHGHATAGGTTGCVTCABGCGBATBAAGATBTBTHYHBHHTAGCGCCCTACGTACGATGCAGTAAATTCCGTAACGTAGTGGTTGAGATATCTCAGTAAGCTTCACGTTGACGTTAGTTBTBTAATCGCCCGTGATGDDDGTGTTAGAATGTTTAAAGGATATACAT
>stt4688097
GCDDDDCJJJJAWDDDDFFFFFCTCATATCGCAGGTGATTCCAACAGATCGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAGGACCTGCGAAAAGGGCAAAACCGACCTCGGACGAGATTGTGTTTACGGACGAGCATGAGAGGTATGTCCGAGAGGCAAATCATTCTTTTCGGCTCTATGGATCCGAAG

--
@Ravinder. That works if there are no multiline sequences.

Last edited by Scrutinizer; 01-11-2015 at 02:35 AM..
# 7  
Old 01-11-2015
This is a protein sequence fasta file, pretty sure the sequences are in one line unless word wrapped
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count specific character of a file in each line and delete this character in a specific position

I will appreciate if you help me here in this script in Solaris Enviroment. Scenario: i have 2 files : 1) /tmp/TRANSACTIONS_DAILY_20180730.txt: 201807300000000004 201807300000000005 201807300000000006 201807300000000007 201807300000000008 2)... (10 Replies)
Discussion started by: teokon90
10 Replies

2. Shell Programming and Scripting

How to delete strings in a file?

hi, i have a big file like this: >s31 length=12 numreads=6 gene=isotig454 status=igo ldfddfdfdfdkkkkkkfdfdkkkksdfdkkkkkkkkkksdfd dfdfdfldfdkdffdlfddflfdjkkkkkkfdgkkgfhghfgkkk ldfddfdfdfdkkkkkkfdfdkkkksdfdkkkkkkkkkksdfd dfdfdfldfdkdffdlfddflfdjkkkkkkfdgkkgfhghfgkkk >c2 length =344... (4 Replies)
Discussion started by: the_simpsons
4 Replies

3. UNIX for Dummies Questions & Answers

Add strings from one file at the end of specific lines in text file

Hello All, this is my first post so I don't know if I am doing this right. I would like to append entries from a series of strings (contained in a text file) consecutively at the end of specifically labeled lines in another file. As an example: - the file that contains the values to be... (3 Replies)
Discussion started by: gus74
3 Replies

4. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

5. Shell Programming and Scripting

output strings to specific positions in a file

Been searching for about 3 hours for similar functionality that I can get examples of how to output text from variables into certain locations in a file. I would like to incorporate this into a script. I have not been able to find a command example that does it all in one method. I find part of... (1 Reply)
Discussion started by: bennu_500
1 Replies

6. UNIX for Dummies Questions & Answers

Delete strings in file1 based on the list of strings in file2

Hello guys, should be a very easy questn for you: I need to delete strings in file1 based on the list of strings in file2. like file2: word1_word2_ word3_word5_ word3_word4_ word6_word7_ file1: word1_word2_otherwords..,word3_word5_others... (7 Replies)
Discussion started by: roussine
7 Replies

7. Shell Programming and Scripting

Delete Strings that are present in another file

HI, if a String is present in file1.txt, i want to delete that String from file2.txt. How can i do this?? I am sure that the file1.txt is a subset of file2.txt. (2 Replies)
Discussion started by: jathin12
2 Replies

8. Shell Programming and Scripting

recursively delete the text between 2 strings from a file

i have 200000bytes size of a unix file i need to delete some text between two strings recursively using a loop with sed or awk . these two strings are : 1st string getting from a file :::2 nd string is fi...its constant . can anyone help me sed -n'/<1 st string >/,/fi/' <input_filename> is the... (2 Replies)
Discussion started by: santosh1234
2 Replies

9. Shell Programming and Scripting

delete strings till specific string

Hello i want to know a way so i can delete all the strings in file from the begning till a specific string (1 Reply)
Discussion started by: modcan
1 Replies

10. Shell Programming and Scripting

Delete strings in a file

Hi, I have a file named status.txt that looks like the file below. What I want to do is to delete the part <status> and </status> and just leave the number and print each number per line. How can I do it? If I will use sed or awk how can I do it? I tried with sed but it didn't work. Maybe I... (8 Replies)
Discussion started by: ayhanne
8 Replies
Login or Register to Ask a Question