Hello,
Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but when I increase the data size to ~1,000 entries (some maybe 100,000bp long), it took about 2 hours to finish.
My question is: How to improve the performance of my code?
It seems memory issue can be excluded as 256GB RAM is available.
1) What are the room for coding techniques based on my current algorithms, which is a simple "sorting---looping---comparing" with complexity n^2 ?
2) What are the better algorithms, for sure there are many?
Either of the two questions is too complicate for myself, but I am wondering if anybody can give me some help to increase the performance of the program. Thanks a lot!
Hi ,
i'm searching for files over many Aix servers with rsh command using this request :
find /dir1 -name '*.' -exec ls {} \;
and then count them with "wc"
but i would improve this search because it's too long and replace directly find with ls command but "ls *. " doesn't work.
and... (3 Replies)
Hi All,
I am using grep command to find string "abc" in one file .
content of file is
***********
abc = xyz
def= lmn
************
i have given the below mentioned command to redirect the output to tmp file
grep abc file | sort -u | awk '{print #3}' > out_file
Then i am searching... (2 Replies)
hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
I have a data file of 2 gig
I need to do all these, but its taking hours, any where i can improve performance, thanks a lot
#!/usr/bin/ksh
echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')"
function showHelp {
cat << EOF >&2
syntax extreme.sh FILENAME
Specify filename to parse
EOF... (3 Replies)
Hi Friends,
I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process.
Please help me to make it faster and improve the performace of the script. i would be very... (10 Replies)
Hi All,
I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately.
Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Hi,
I have around one lakh records. I have used XML for the creation of the data.
I have used these 2 Perl modules.
use XML::DOM;
use XML::LibXML;
The data will loo like this and most it is textual entries.
<eid>19000</eid>
<einfo>This is the ..........</einfo>
......... (3 Replies)
Hi ,
i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script
#!/bin/bash
file=$1
ofile=$2
cp $file $ofile
mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13
mydates=$(echo "$mydates" | sort |... (5 Replies)
Discussion started by: vikatakavi
5 Replies
LEARN ABOUT DEBIAN
clustalo
clustalo(1) USER COMMANDS clustalo(1)NAME
clustalo - General purpose multiple sequence alignment program for proteins
SYNOPSIS
clustalo [-h]
DESCRIPTION
Clustal-Omega is a general purpose multiple sequence alignment (MSA) program for proteins. It produces high quality MSAs and is capable of
handling data-sets of hundreds of thousands of sequences in reasonable time.
In default mode, users give a file of sequences to be aligned and these are clustered to produce a guide tree and this is used to guide a
"progressive alignment" of the sequences. There are also facilities for aligning existing alignments to each other, aligning a sequence to
an alignment and for using a hidden Markov model (HMM) to help guide an alignment of new sequences that are homologous to the sequences
used to make the HMM. This latter procedure is referred to as "external profile alignment" or EPA.
Clustal-Omega uses HMMs for the alignment engine, based on the HHalign package from Johannes Soeding [1]. Guide trees are made using an
enhanced version of mBed [2] which can cluster very large numbers of sequences in O(N*log(N)) time. Multiple alignment then proceeds by
aligning larger and larger alignments using HHalign, following the clustering given by the guide tree.
In its current form Clustal-Omega can only align protein sequences but not DNA/RNA sequences. It is envisioned that DNA/RNA will become
available in a future version.
USAGE
Tool usage is available in /usr/share/doc/clustalo/README.
DEVELOPMENT
Headers and libraries are available in libclustalo-dev package.
CITING
Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H,
Remmert M, Soding J, Thompson JD, Higgins DG (2011). Fast, scalable generation of high-quality protein multiple sequence alignments
using Clustal Omega. Mol Syst Biol 7.
AUTHOR
Olivier Sallou (olivier.sallou (at) irisa.fr) - Man page and packaging
Conway Institute UCD Dublin (clustalw (at) ucd.ie) - clustalo
version 1.0.3 December 14, 2011 clustalo(1)