Trimming sequences based on specific pattern


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Trimming sequences based on specific pattern
# 1  
Old 06-23-2010
Trimming sequences based on specific pattern

My files look like this
Quote:
>GHXCZCC01AJ8CJ
TTGATGTGCTTGGTGTGTATCATTTCTGGGAAGCCCTACGCCCCGGGGC
>GHXCZCC01APUO5
T-ATGTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCAGCCCTACCCGGGGCGA
>GHXCZCC01AQSRP
TTGATGTTA---AGCTGGATTTTCTGGGACGCCCCGGGGAGCCCTA
>GHXCZCC01AQSRP
TTGTTGCCAGCTAGCTGAGCCCTAGATTTTCTGGGGCCCCGGGG
>GHXCZCC01AQSRP
TTGATGTTGCCCAGCCCTATAGCTGGATTTTCTGGGACGCCCCGGGGTGC
And I need to cut the sequences at the last "A" found in the following 'pattern' -highlighted for easier identification, the pattern is the actual file is not highlighted.
Quote:
AGCCCTA
The expected result should look like this
Quote:
>GHXCZCC01AJ8CJ
TTGATGTGCTTGGTGTGTATCATTTCTGGGAAGCCCTA
>GHXCZCC01APUO5
T-ATGTGCCGTTGGTGTGTATCAGCTGGATTTCTGGGACGCAGCCCTA
>GHXCZCC01AQSRP
TTGATGTTA---AGCTGGATTTTCTGGGACGCCCCGGGGAGCCCTA
>GHXCZCC01AQSRP
TTGTTGCCAGCTAGCTGAGCCCTA
>GHXCZCC01AQSRP
TTGATGTTGCCCAGCCCTA
Thus, all the sequences would end with AGCCCTA but whatever is to the left of that particular pattern and the identifiers (>GHXCZCC01AJ8CJ) should be kept intact.
Thanks in advance
# 2  
Old 06-23-2010
Hi Xterra, try this:
Code:
sed 's/\(.*AGCCCTA\).*/\1/' infile

This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 06-23-2010
Fantastic!!!!

It works like a charm!
Thanks once again!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for duplicates and delete but remain the first one based on a specific pattern

Hi all, I have been trying to delete duplicates based on a certain pattern but failed to make it works. There are more than 1 pattern which are duplicated but i just want to remove 1 pattern only and remain the rest. I cannot use awk '!x++' inputfile.txt or sed '/pattern/d' or use uniq and sort... (7 Replies)
Discussion started by: redse171
7 Replies

2. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

3. UNIX for Dummies Questions & Answers

Trimming a string based on delimiter.

Hi, I have a string say "whateverCluster". I need everthing apart from the string "Cluster" Input: whateverCluster Desired output: whatever (5 Replies)
Discussion started by: mohtashims
5 Replies

4. Shell Programming and Scripting

Help with replace line based on specific pattern match

Input file data20714 7327 7366 detail data20714 7327 7366 main data250821 56532 57634 detail data250821 57527 57634 main data250821 57359 57474 main data250821 57212 57301 main data250821 57140 57159 detail data250821 56834 57082 main data250821 56708 56779 main ... (3 Replies)
Discussion started by: perl_beginner
3 Replies

5. Shell Programming and Scripting

Delete files based on specific MMDDYYYY pattern in filename

Hi Unix gurus, I am trying to remove the filenames based on MMDDYYYY in the physical name as such so that the directory always has the recent 3 files based on MMDDYYYY. "HHMM" is just dummy in this case. You wont have two files with different HHMM on the same day. For example in a... (4 Replies)
Discussion started by: shankar1dada
4 Replies

6. Shell Programming and Scripting

Trimming sequences based on Reference

My file looks something like this Wnat I need is to look for the Reference sequence (">Reference1") and based on the length of that sequence trim all the entries in that file. So, the rersulting file will contain all sequences with the same length, like this Thus, all sequences will keep... (5 Replies)
Discussion started by: Xterra
5 Replies

7. Shell Programming and Scripting

Removing specific sequences from file

My file looks like this But I need to remove the entry with the identifier >Reference1 along with the entire sequence. Thus, I will end up having the following file Thanks in advance! (2 Replies)
Discussion started by: Xterra
2 Replies

8. Shell Programming and Scripting

trimming sequences

My file looks like this: But I would like to 'trim' all sequences to the same lenght 32 characters, keeping intact all the identifier (>GHXCZCC01AJ8CJ) Would it be possible to use awk to perform this task? (2 Replies)
Discussion started by: Xterra
2 Replies

9. Shell Programming and Scripting

Concatenating and appending string based on specific pattern match

Input #GEO-1-type-1-fwd-Initial 890 1519 OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFV #GEO-1-type-2-fwd-Terminal 1572 2030... (7 Replies)
Discussion started by: patrick87
7 Replies

10. Shell Programming and Scripting

Merge two file data together based on specific pattern match

My input: File_1: 2000_t g1110.b1 abb.1 2001_t g1111.b1 abb.2 abb.2 g1112.b1 abb.3 2002_t . . File_2: 2000_t Ali england 135 abb.1 Zoe british 150 2001_t Ali england 305 g1111.b1 Lucy russia 126 (6 Replies)
Discussion started by: patrick87
6 Replies
Login or Register to Ask a Question