11-11-2003
Awesome, thanks a ton for both of those responses... After playing around with these things for a little while, it is making more sense to me. Thanks again for the nice answers, I really appreciate the help, and I'm having a blast learning more about this stuff.
Thanks!
Jason
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
How do I append a header.txt file to files that start with xxx_*.? There are 20 different files that start with xxx_*.
Thanks (1 Reply)
Discussion started by: chumba
1 Replies
2. Shell Programming and Scripting
Hi,
I need to write a shell script (ksh) to read contents starting at a specific location from one file and append the contents at specific location in another file. Please find below the contents of the source file that I need to read the contents from,
File 1
-----# more... (5 Replies)
Discussion started by: dnicky
5 Replies
3. Shell Programming and Scripting
hi,
i have a file
file1 file2
----------- -----------------
aa bbb ccc 111 1111 1111
ddd eee fff 222 3333 4444
ggg hhh... (5 Replies)
Discussion started by: go4desperado
5 Replies
4. Shell Programming and Scripting
Hi,
I have file named Ex.txt which has the contents like this-
<FndManagers>
Mng={{PID|8819|LB}|75.000000|75.000000}
Mng={{PID|25069|ML}|25.000000|0.000000}
Mng={{PID|6034033|ML}|0.000000|25.000000}
</FndManagers>
I have a code which searches for the number 8819 and appends with the... (1 Reply)
Discussion started by: vinay123
1 Replies
5. Shell Programming and Scripting
I'm new to shell scripting and am writing a script to help me log the free memory and hd space on a server. As of now, the script just runs 'df -h' and appends the output to a file and then runs 'top' and appends the output to a log file.
What I want to do, is have the script also search the... (3 Replies)
Discussion started by: enator45
3 Replies
6. Shell Programming and Scripting
hello everybody,
I have some files in directory.each file contain some data.
my requirement is add the count of each line of file in head of each file.
any advice !!!!!!!! (4 Replies)
Discussion started by: abhigrkist
4 Replies
7. Shell Programming and Scripting
I am having 6 files named file1,file2....file6 and i need to append number of lines in each file to begining of the file. For example,
If file 1 contains
a
b
c
d
then after adding new line file1 should contain
4
a
b
c
d
Thanks in advance. (2 Replies)
Discussion started by: akhay_ms
2 Replies
8. Shell Programming and Scripting
I have two files like ABC_DEF_yyyyymmdd_hhmiss_XXX.txt and ABC_DEF_yyyyymmdd_hhmiss_YYY.txt. The date part is going to be changing everytime. How do i remove this date part of the file and create a single file like ABC_DEF_XXX.txt. (8 Replies)
Discussion started by: varlax
8 Replies
9. Shell Programming and Scripting
I have a file that looks like this:
1|A
2|B
3|C
...
...
100|A
I would like to take the last line in the file and add +1 to the number so
the output looks like this
1|A (4 Replies)
Discussion started by: BeefStu
4 Replies
10. Shell Programming and Scripting
Hi there,
i've got a file with this content
$ cat file1
Matt
Mar
The other file has the same number of lines with this content:
$ cat file2
20404=767294
23450=32427
is there a way with either using sed, awk or paste to insert the content of file1 before the "=" character? So... (3 Replies)
Discussion started by: nms
3 Replies
LEARN ABOUT DEBIAN
psi-cd-hit-2d
PSI-CD-HIT-2D.PL(1) User Commands PSI-CD-HIT-2D.PL(1)
NAME
psi-cd-hit-2d.pl - runs similar algorithm like CD-HIT but using BLAST to calculate similarities in db1 or db2 format
DESCRIPTION
Usage psi-cd-hit-2d [Options]
Options
-i in_dbname, required
-o out_dbname, required
-c clustering threshold (sequence identity), default 0.3
-ce clustering threshold (blast expect), default -1,
it means by default it doesn't use expect threshold, but with positive value, the program cluster seqs if similarities meet either
identity threshold or expect threshold
-L coverage of shorter sequence ( aligned / full), default 0.0
-M coverage of longer sequence ( aligned / full), default 0.0
-R (1/0) use psi-blast profile? default 0 perform psi-blast / pdb-blast type search
-G (1/0) use global identity? default 1 sequence identity calculated as
total identical residues of local alignments / length of shorter seq
if you prefer to use -G 0, it is suggested that you also use -L, such as -L 0.8, to prevent very short matches.
-d length of description line in the .clstr file, default 30 if set to 0, it takes the fasta defline and stops at first space
-l length_of_throw_away_sequences, default 10
-p profile search para, default
"-a 2 -d nr80 -j 3 -F F -e 0.001 -b 500 -v 500"
-bfdb profile database, default nr80
-s blast search para, default
"-F F -e 0.000001 -b 100000 -v 100000"
-be blast expect cutoff, default 0.000001
-b filename of list of hosts to run this program in parallel with ssh calls, you need provide a list of hosts
-pbs No of jobs to send each time by PBS querying system
you can not use both ssh and pbs at same time
-k (1/0) keep blast raw output file, default 1
-rs steps of save restart file and clustering output, default 5000
everytime after process 5000 sequences, program write a restart file and current clustering information
-restart restart file, readin a restart file
if program crash, stoped, termitated, you can restart it by add a option "-restart sth.restart"
-rf steps of re format blast database, default 200,000
if program clustered 200,000 seqs, it remove them from seq pool, and re format blast db to save time
-local dir of local blast db,
when run in parallel with ssh (not pbs), I can copy blast dbs to local drives on each node to save blast db reading time BUT, IT MAY
NOT FASTER
-J job, job_file, exe specific jobs like parse blast outonly DON'T use it, it is only used by this program itself
-single files of ids those you known that they are singletons
so I won't run them as queries
-i2 second input database
-blastn run blastn, default 0
-lo how long can seq in db2 > db1 in a cluster, default 0
means, that seq in db2 should <= seqs in db1 in a cluster
============================== by Weizhong Li, liwz@sdsc.edu ==============================
If you find cd-hit useful, please kindly cite:
"Clustering of highly homologous sequences to reduce thesize of large protein database", Weizhong Li, Lukasz Jaroszewski & Adam
GodzikBioinformatics, (2001) 17:282-283 "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide
sequences", Weizhong Li & Adam Godzik Bioinformatics, (2006) 22:1658-1659
psi-cd-hit-2d.pl 4.6-2012-04-25 April 2012 PSI-CD-HIT-2D.PL(1)