Using = with sed to increase sequence count


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using = with sed to increase sequence count
# 1  
Old 10-03-2018
Using = with sed to increase sequence count

I have a fasta file like this one:
Code:
>ID1
AAAAAA
>ID2
TTTTTT

And I am using this sed script to increase the count sequence
Code:
sed '/^>/s/.*//;/^$/=;/^$/d' text.txt | sed 's/[1-9].*/echo ">seq" \$(( ( & + 1 )\/2 ))/e'

I get the desired output:
Code:
>seq 1
AAAAAA
>seq 2
TTTTTT

However, this doesn't work and I do not understand why:
Code:
sed '/^>/s/.*//;/^$/=;/^$/d;s/[1-9].*/echo ">seq" \$(( ( & + 1 )\/2 ))/e' text.txt

I was hoping someone here would help me understand the issue. Moreover, I was hoping I could get a better, more elegant sed solution. While perl or awk might be more appropriate, I am actually looking for 100% sed approach.
Thanks in advance
# 2  
Old 10-03-2018
sed is a great tool; but, since you can't perform arithmetic calculations in sed , a 100% sed solution is not possible.

An awk solution for this is simple:
Code:
awk '/^>/{$0 = ">seq " ++seq}1' file

# 3  
Old 10-03-2018
Your (certainly simplified and thus non-representative) sample leans itself towards
Code:
sed 's/>ID/>seq /' file
>seq 1
AAAAAA
>seq 2
TTTTTT

EDIT: And here it is - the non-efficient but 100% sed solution (tadaaa!):


Code:
sed 'N; s/\n/#/' file | sed '=' | sed -r 'N; s/\n//; s/^([0-9]*)>ID[0-9]*#/>seq \1\n/'
>seq 1
AAAAAA
>seq 2
TTTTTT

or, even simpler,
Code:
sed 'N; s/^.*\n//' file | sed '=' | sed '/^[0-9]\+/ s/^/>seq /'

Don't change your input file structure and then complain it would not work ...

Last edited by RudiC; 10-03-2018 at 06:16 AM..
These 2 Users Gave Thanks to RudiC For This Post:
# 4  
Old 10-03-2018
Hi, for fun, only one command sed:
Code:
sed -e '1{
 x
 s/.*/0/
 x
}
/^>/{
 x
 :d
 s/9\(_*\)$/_\1/
 td
 s/^\(_*\)$/0\1/
 s/8\(_*\)$/9\1/
 s/7\(_*\)$/8\1/
 s/6\(_*\)$/7\1/
 s/5\(_*\)$/6\1/
 s/4\(_*\)$/5\1/
 s/3\(_*\)$/4\1/
 s/2\(_*\)$/3\1/
 s/1\(_*\)$/2\1/
 s/0\(_*\)$/1\1/
 s/_/0/g
 x
 G
 s/.*\n/>seq /
}'  file

Increment code take in gnu sed documentation (info sed)

Regards.
# 5  
Old 10-03-2018
Quote:
Originally Posted by Don Cragun
sed is a great tool; but, since you can't perform arithmetic calculations in sed , a 100% sed solution is not possible.
[not-quite-serious-mode]

Ha! This is perhaps the first time i find something to nit-pick in anything the infallible Don has pontificated. Actually it is posssible to do arithmetic in sed. Here, for example, is addition/subtraction (from stackoverflow):

Code:
s/[0-9]/<&/g
s/0//g
s/1/|/g
s/2/||/g
s/3/|||/g
s/4/||||/g
s/5/|||||/g
s/6/||||||/g
s/7/|||||||/g
s/8/||||||||/g
s/9/|||||||||/g
: tens
s/|</<||||||||||/g
t tens
s/<//g
s/+//g
: minus
s/|-|/-/g
t minus
s/-$//
: back
s/||||||||||/</g
s/<\([0-9]*\)$/<0\1/
s/|||||||||/9/
s/||||||||/8/
s/|||||||/7/
s/||||||/6/
s/|||||/5/
s/||||/4/
s/|||/3/
s/||/2/
s/|/1/
s/</|/g
t back

In fact, sed is a (Turing-) complete programming language. This can be shown by either writing a Turing-machine in sed (shown here) or by writing an interpreter for another Turing-complete language. With much fanfare, here is a Brainfuck-interpreter written in sed.

[/not-quite-serious-mode]

I hope this helps (well, actually i doubt it, but this is a holiday where i am, so it is a day off and it is fun).

bakunin

PS: Input to the sed-script above would be "100+15" or "250-173"
These 2 Users Gave Thanks to bakunin For This Post:
# 6  
Old 10-03-2018
@bakunin: now, please, show us how to multiply floats with sed. I'd like to see you juggle 1E18 lucifer matches ("Streichholz" in German) ...
# 7  
Old 10-03-2018
Quote:
Originally Posted by RudiC
@bakunin: now, please, show us how to multiply floats with sed. I'd like to see you juggle 1E18 lucifer matches ("Streichholz" in German) ...
Sigh, and that on my day off. Fortunately there is aunt Google, which is always there when i need her. From math - Addition with 'sed' - Unix & Linux Stack Exchange:

Code:
sed 's/[0-9]/<&/g
s/0//g; s/1/|/g; s/2/||/g; s/3/|||/g; s/4/||||/g; s/5/|||||/g; s/6/||||||/g
s/7/|||||||/g; s/8/||||||||/g; s/9/|||||||||/g
: tens
s/|</<||||||||||/g
t tens
s/<//g
s/.*\*$/0/
s/^\*.*/0/
s/*|/*/
: mult
s/\(|*\)\*|/\1<\1*/ 
t mult
s/*//g
s/<//g
: back
s/||||||||||/</g
s/<\([0-9]*\)$/<0\1/
s/|||||||||/9/; s/||||||||/8/; s/|||||||/7/; s/||||||/6/; s/|||||/5/; s/||||/4/
s/|||/3/; s/||/2/; s/|/1/
s/</|/g
t back'

and @Wisecracker: the implementation of an FFT is sed is left to the interested reader. ;-))

bakunin

PS: my favourite math quote: "Base 8 is actually like base 10 - if you are missing two fingers." (Tom Lehrer, "New Math")

Last edited by bakunin; 10-03-2018 at 11:04 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Hello, I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
Discussion started by: empyrean
5 Replies

2. Shell Programming and Scripting

Ignore escape sequence in sed

Friends, In the file i am having more then 100 lines like, File1 had the values like this: #Example East.server_01=EAST.SERVER_01 East.server_01=EAST.SERVER_01 West.server_01=WEST.SERVER_01 File2 had the values like this: #Example EAST.SERVER_01=http://yahoo.com... (3 Replies)
Discussion started by: jothi basu
3 Replies

3. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

4. Shell Programming and Scripting

How to get count of replacements done by sed?

Hi , How can i get count of replacements done by sed in a file. I know grep -c is a method. But say if sed had made 10 replacement in a file, can i get number 10 some how? (8 Replies)
Discussion started by: abhitanshu
8 Replies

5. Shell Programming and Scripting

count and number instances of a character in sed or awk

I currently use LaTeX together with a sed script to set cloze test papers for my students. I currently pepend and equals sign to the front of the words I want to leave out in the finished test, =perpendicular, for example. I am able to number the blanks using a variable in LaTeX. I would like to... (8 Replies)
Discussion started by: maouinin
8 Replies

6. UNIX for Dummies Questions & Answers

Grep char count & pipe to sed command

Hi I am having a 'grep' headache Here is the contents of my file: (PBZ,CP,(((ME,PBZ,BtM),ON),((ME,((PBZ,DG),(CW9,PG11))),CW9,TS2,RT1))) I would like to count out how many times 'PBZ' occurs and then place that number in the line above 3... (8 Replies)
Discussion started by: cavanac2
8 Replies

7. Shell Programming and Scripting

Increase sed performance

I'm using sed to do find and replace. But since the file is huge and i have more than 1000 files to be searched, the script is taking a lot of time. Can somebody help me with a better sed command. Below is the details. Input: 1 1 2 3 3 4 5 5 Here I know the file is sorted. ... (4 Replies)
Discussion started by: gpaulose
4 Replies

8. UNIX for Dummies Questions & Answers

count number of fields not using SED or AWK

hi forums i need help with a little problem i am having. i need to count the number of fields that are in a saved variable so i can use that number to make a different function work properly. is there a way of doing this without using SED/AWK? anything would be greatly appreciated (4 Replies)
Discussion started by: strasner
4 Replies

9. UNIX for Dummies Questions & Answers

Record count problem using sed command

Hi, I have a script which removes 2 header records and 1 trailer record in a list of files. The commands doing the actions are sed '1,2d' $file > tempfile1.dat sed '$d' < tempfile1.dat > $output.txt Its working fine for all records except a file having size=1445509814 and number of... (2 Replies)
Discussion started by: ayanbiswas
2 Replies

10. UNIX for Dummies Questions & Answers

IPv4 addresses: count/output and Awk/Sed

Hi forum. I am fairly new to scripting and use a simple script to process e-mails for my work. These e-mails contain a list of IPv4 IPs that I process and seperate into text files, which are then attached to a larger, 'digest' e-mail. I also put some of the output from the text files into the... (4 Replies)
Discussion started by: laebshade
4 Replies
Login or Register to Ask a Question