Sponsored Content
Full Discussion: Remove empty records
Top Forums Shell Programming and Scripting Remove empty records Post 302639107 by yifangt on Friday 11th of May 2012 08:17:45 AM
Old 05-11-2012
Remove empty records

Hello:
Is there a simple way to remove empty records of FASTA format file?
A FASTA format consists of two parts: header and sequence (for non-biologist, Wiki for details of course!). The header always start with ">" for the name of the sequence. The header must be in this ONLY single line. Following the header are the sequence that can be in single or multiple rows. Example:
Code:
>header1
ACGATAGCTCTAGCTAGCTA
>header2
GGCGCTATG
>other header name
GCGGCGGGGCGTTTAAA
ATCGAT

I have a file some of which only have headers but not sequence (i.e. empty sequences). I need to remove them.
Input file:
Code:
>head1
ACGATAGCTCTAGCTAGCTA
>header2
GGCGCTATGGCGACTGATCAGC
CCGAAAGATGCT
>other header name
>some thing maybe long but single line for sure
GCTAGCTAGCA
>something 
>strange header 2
AGCTAGCTGAGGGAGGAGGGA
>some description  
>other description  2
CGTAGCTAGGTAGATTTA
>something not good for me

ouput
Code:
>head1
ACGATAGCTCTAGCTAGCTA
 >header2
 GGCGCTATGGCGACTGATCAGC
CCGAAAGATGCT
 >some thing maybe long but single line for sure
GCTAGCTAGCA
>strange header 2
AGCTAGCTGAGGGAGGAGGGA
 >other description  2
 CGTAGCTAGGTAGATTTA

There are bioperl scripts and other tools to do the job. I was thinking there must be a simpler handy way with sed or awk to do it, especially on the terminal for smaller files. Perl oneliner works too, but I have problem with the "$/" variable as the input sequence can be different multiple rows.
Could not figure this out by myself. Post it here for to get experts' ideas. Thanks in advance!
Yifang
 

10 More Discussions You Might Find Interesting

1. Solaris

Remove non empty dirctory

Hi, Any command or means to delete a director which is not empty rmdir or similar iam using Sun Solaries 2.6 :confused: I have many full directories with subdirectories and I can not go on emptying them all (5 Replies)
Discussion started by: adol3
5 Replies

2. Shell Programming and Scripting

remove empty directory

Hi, I need to delete an empty directory in a temp directory except "dir5" (keep everything that is not empty). Plese advise. Here is an example of my directory. /dir/temp/ dir1 - delete if this is empty dir2 - delete if this is empty dir3 - delete if this is empty dir4 - delete if this... (7 Replies)
Discussion started by: sirrtuan
7 Replies

3. UNIX for Dummies Questions & Answers

Remove only Empty Directories

I know this one was answered before in forum below - https://www.unix.com/unix-dummies-questions-answers/58210-removing-empty-folders-using-find-command.html But that one is closed & I have a question so here it goes. I want to delete all 2006 files. Now if along with the files, if the... (2 Replies)
Discussion started by: kedar.mehta
2 Replies

4. Shell Programming and Scripting

using vi -c to remove empty lines

Hello: I searched here for "vi -c" but found no hits. How can I use vi -c to remove ALL empty lines, regardless of how many? I tried <code> vi -c ":g/^$/d | wq" filename </code> but I have to run it several times. This is NOT homework. :) Thanks for your time. (3 Replies)
Discussion started by: Habitual
3 Replies

5. UNIX for Dummies Questions & Answers

Grep specific records from a file of records that are separated by an empty line

Hi everyone. I am a newbie to Linux stuff. I have this kind of problem which couldn't solve alone. I have a text file with records separated by empty lines like this: ID: 20 Name: X Age: 19 ID: 21 Name: Z ID: 22 Email: xxx@yahoo.com Name: Y Age: 19 I want to grep records that... (4 Replies)
Discussion started by: Atrisa
4 Replies

6. Shell Programming and Scripting

Remove empty line and the next one

Hi all, I'm trying to remove when this condition is met: an empty and the next one I'm using this command: sed '/^$/N; s/&//' file Which searches for an empty line, N attaches it to the next line, and substituing the combination with nothing.... but it is not working. What I'm missing... (1 Reply)
Discussion started by: meuser
1 Replies

7. UNIX for Dummies Questions & Answers

remove empty field

Hi all ! I'm sure it is a basic question but I didn't find any threads that fit my need. How to remove empty fields with awk? Or in other words, how to shift all the fields after an empty field on the left? input: 1|2||3|4|5||6 wanted: 1|2|3|4|5|6 I tried: awk '{for(i=1; i<=NF;... (7 Replies)
Discussion started by: lucasvs
7 Replies

8. Shell Programming and Scripting

Remove CR only on empty lines

Dear community, I have two output files that contains some CR # cat first.out 1234567890 598679857648566 9 1234567234 365837465873465 4 2342343243 289374982374894 4 # cat second.out 2342342342 ... (2 Replies)
Discussion started by: Lord Spectre
2 Replies

9. Shell Programming and Scripting

How to remove empty line.?

Hi gurus, I have a script which works fine. https://www.unix.com/shell-programming-and-scripting/239347-how-pass-string-into-sql-query.html while read p do && para="'${p}'" || para="${para},'${p}'" done < filePlease use code tags as required by forum rules! a few days... (6 Replies)
Discussion started by: ken6503
6 Replies

10. Shell Programming and Scripting

Remove empty files in home directory

how to remove empty files tried below command its remove only zero bytes not empty file which is greater then zero byte. for x in * do if then rm $x fi done (8 Replies)
Discussion started by: Kalia
8 Replies
HHCONSENSUS(1)							   User Commands						    HHCONSENSUS(1)

NAME
hhconsensus - calculate the consensus sequence for an A3M/FASTA input file SYNOPSIS
hhconsensus -i <file> [options] DESCRIPTION
HHconsensus version 2.0.15 (June 2012) Calculate the consensus sequence for an A3M/FASTA input file. (C) Johannes Soeding, Michael Rem- mert, Andreas Biegert, Andreas Hauser Remmert M, Biegert A, Hauser A, and Soding J. HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods 9:173-175 (2011). -i <file> query alignment (A2M, A3M, or FASTA), or query HMM Output options: -s <file> append consensus sequence in FASTA (default=<infile.seq>) -o <file> write alignment with consensus sequence in A3M -oa3m <file> same -oa2m <file> write alignment with consensus sequence in A2M -ofas <file> write alignment with consensus sequence in FASTA -v <int> verbose mode: 0:no screen output 1:only warings 2: verbose Filter input alignment (options can be combined): -id [0,100] maximum pairwise sequence identity (%) (def=100) -diff [0,inf[ filter most diverse set of sequences, keeping at least this many sequences in each block of >50 columns (def=0) -cov [0,100] minimum coverage with query (%) (def=0) -qid [0,100] minimum sequence identity with query (%) (def=0) -qsc [0,100] minimum score per column with query (def=-20.0) Input alignment format: -M a2m use A2M/A3M (default): upper case = Match; lower case = Insert; '-' = Delete; '.' = gaps aligned to inserts (may be omitted) -M first use FASTA: columns with residue in 1st sequence are match states -M [0,100] use FASTA: columns with fewer than X% gaps are match states Other options: -addss add predicted secondary structure information from PSIPRED Example: hhconsensus -i stdin -s stdout hhconsensus 2.0.15 June 2012 HHCONSENSUS(1)
All times are GMT -4. The time now is 01:39 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy