Find & Replace command - Fasta file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Find & Replace command - Fasta file
# 1  
Old 12-08-2011
Find & Replace command - Fasta file

Hi all !

I have a fasta file that looks like that:

>Sequence1
RTYIPLCASQHKLCPITFLAVK

(it's just an example, obviously in reality I have several pairs of lines like that)

Using UNIX command(s), would it be possible to replace all the characters except the "C" of the second line only by a dash. So without modifying the first line.
To be able to obtain this:

>Sequence1
------C------C--------

Or at least which command would you use? (grep, awk, perl, sed...)

Thank you very much for your help !!!
Cevin21
# 2  
Old 12-08-2011
Code:
 
$ cat test.txt
>Sequence1
RTYIPLCASQHKLCPITFLAVK
>Sequence1
RTYIPLCASQHKLCPITFLAVK
>Sequence1
RTYIPLCASQHKLCPITFLAVK
 
$ sed '2 s/[A-BD-Z]/\-/g' test.txt
>Sequence1
------C------C--------
>Sequence1
RTYIPLCASQHKLCPITFLAVK
>Sequence1
RTYIPLCASQHKLCPITFLAVK

# 3  
Old 12-08-2011
Thank you so much for your help !

I see the spirit of your approach.

But I didn't explain myself very well.

In reality my file looks like that:

>Sequence1
RTYIPLCASQHKLCPITFLAVK
>Sequence2
ERCCVASTWQCIPLKMCI
>Sequence3
TYIPLKCRYTWSCCPLVAQCYTR

And I would like to obtain:
>Sequence1
------C------C--------
>Sequence2
--CC------C-----C-
>Sequence3
------C-----CC-----C---

By "only the second line" I meant the second line of the pair of lines (or only the line with even number)
Cevin21
# 4  
Old 12-08-2011
No doubt posssible in "awk" or "sed" or whatever.

Open logic Shell approach. Too slow for large files.

Code:
cat filename.txt | while read line
do
        echo "${line}"
        read line
        echo "${line}"|fold -w 1|while read char
        do
                if [ ! "${char}" = "C" ]
                then
                        char="-"
                fi
                echo "${char}\c"
        done
        echo ""
done

>Sequence1
------C------C--------
>Sequence2
--CC------C-----C-
>Sequence3
------C-----CC-----C---

# 5  
Old 12-08-2011
Code:
awk '/^[^>]/ { gsub(/[^C]/, "-"); }' < infile > outfile

if awk doesn't work, try nawk or gawk.
# 6  
Old 12-09-2011
Thank you methyl and Corona !

Unfortunately none of the propositions work.

---------- Post updated at 02:41 PM ---------- Previous update was at 02:19 PM ----------

I was thinking about something like that:

awk '{if ($1 ~/>/) print $0; else (sed 's/[^C]/-/g') print $0}' file.fasta

but it doesn't work.
Syntax error probably !
Cevin21
# 7  
Old 12-09-2011
Code:
perl -npe '!/^>Sequence/&&s/[ABD-Z]/-/g' inputfile.dat

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to find a specific sequence pattern in a fasta file?

I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position, AAGCZ-N16-AAGCZ Z represents A, C or G (Except T) N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies

2. Shell Programming and Scripting

Command Line Perl for parsing fasta file

I would like to take a fasta file formated like >0001 agttcgaggtcagaatt >0002 agttcgag >0003 ggtaacctga and use command line perl to move the all sample gt 8 in length to a new file. the result would be >0001 agttcgaggtcagaatt >0003 ggtaacctga cat ${sample}.fasta | perl -lane... (2 Replies)
Discussion started by: jdilts
2 Replies

3. UNIX for Dummies Questions & Answers

Find & Replace

Hi I am looking to rename the contents of this dir, each one with a new timestamp, interval of a second for each so it the existing format is on lhs and what I want is to rename each of these to what is on rhs..hopefully it nake sense CDR.20060505.150006.gb CDR.20121211.191500.gb... (3 Replies)
Discussion started by: rob171171
3 Replies

4. Solaris

Monitoring log file for entries - Find command & sorting

hi, I would like to monitor a log file, which rolls over, everytime a server is restarted. I would like to grep for a string, and to be more efficient i'd like to grep only newly appended data. so something like a 'tail -f' would do, however, as the log rolls over i think a 'tail -F' is... (2 Replies)
Discussion started by: horhif
2 Replies

5. Shell Programming and Scripting

Find & replace --> create a new file

Hi All, I have a unix shell script file as below. My task is a)to replace 248 to 350 and need to create a new file as BW3_350.sh b)to replace 248 to 380 and need to create a new file as BW3_380.sh c)to replace 248 to 320 and need to create a new file as BW3_320.sh there is no... (6 Replies)
Discussion started by: karthi_mrkg
6 Replies

6. Shell Programming and Scripting

How to use grep & find command to find references to a particular file

Hi all , I'm new to unix I have a checked project , there exists a file called xxx.config . now my task is to find all the files in the checked out project which references to this xxx.config file. how do i use grep or find command . (2 Replies)
Discussion started by: Gangam
2 Replies

7. Shell Programming and Scripting

find & replace comma in a .csv file.

HI, Please find the text below. I receive a .csv file on server. I need the comma(,) in the second column to be replaced by a semi-colon( ; ). How to do it. Please help. Sample text: "1","lastname1,firstname1","xxxxxx","19/10/2009","23/10/2009","0","N","Leave"... (2 Replies)
Discussion started by: libin4u2000
2 Replies

8. Shell Programming and Scripting

Find & Replace string in multiple files & folders using perl

find . -type f -name "*.sql" -print|xargs perl -i -pe 's/pattern/replaced/g' this is simple logic to find and replace in multiple files & folders Hope this helps. Thanks Zaheer (0 Replies)
Discussion started by: Zaheer.mic
0 Replies

9. UNIX for Dummies Questions & Answers

how to use sed or perl command to find and replace a directory in a file

how to use sed command to find and replace a directory i have a file.. which contains lot of paths ... for eg.. file contains.. /usr/kk/rr/12345/1 /usr/kk/rr/12345/2 /usr/kk/rr/12345/3 /usr/kk/rr/12345/4 /usr/kk/rr/12345/5 /usr/kk/rr/12345/6 /usr/kk/rr/12345/7... (1 Reply)
Discussion started by: wip_vasikaran
1 Replies

10. Shell Programming and Scripting

Find & Replace

I get a text file with 70+ columns (seperated by Tab) and about 10000 rows. The 58th Column is all numbers. But sometimes 58th columns has "/xxx=##" after the numeric data. I want to truncate this string using the script. Any Ideas...:confused: (3 Replies)
Discussion started by: gagansharma
3 Replies
Login or Register to Ask a Question