10-26-2019
Thanks, would it be possible to retain the '>' at the header of each sequence?
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hopefully someone here can point me in the correct direction.
I'm working on a username migration and am trying to map my users ols usernames to the new ones.
Right now every user has a username of firstname.lastname i.e. john.doe
I'm trying to create a bash or python script that will take... (3 Replies)
Discussion started by: binary-ninja
3 Replies
2. Shell Programming and Scripting
Hi,
I am having a file of dna sequences in fasta format which look like this:
>admin_1_45
atatagcaga
>admin_1_46
atatagcagaatatatat
with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Discussion started by: margarita
5 Replies
3. Shell Programming and Scripting
I have two files. File1 is shown below.
>153L:B|PDBID|CHAIN|SEQUENCE
RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL
KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM
DIGTTHDDYANDVVARAQYYKQHGY
>16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies
4. Shell Programming and Scripting
I have a fasta file as follows
>sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3
MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN
TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM
KGVTSTRVYERA
>sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Discussion started by: alexypaul
3 Replies
5. UNIX for Dummies Questions & Answers
Hi
How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this:
>H8V34IS02I59VP
SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG
YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA... (6 Replies)
Discussion started by: Marion MPI
6 Replies
6. UNIX for Dummies Questions & Answers
Hi,
I need some help with modifying fasta headers.
I have a fasta file with thousands of contigs and I need to modify their headers with the information obtained from a second file.
File 1 contains the fasta sequences:
>contig0001 length=11115 numreads=10777
agatgtagatctct... (6 Replies)
Discussion started by: Lokaps
6 Replies
7. Shell Programming and Scripting
Hi,
I have a fasta file with multiple sequences. How can i get only unique sequences from the file.
For example
my_file.fasta
>seq1
TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC
>seq2... (3 Replies)
Discussion started by: Ibk
3 Replies
8. UNIX for Beginners Questions & Answers
I could calculate the length of entire fasta sequences by following command,
awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta
But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be... (14 Replies)
Discussion started by: dineshkumarsrk
14 Replies
9. Shell Programming and Scripting
I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty.
I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Discussion started by: verdepollo
3 Replies
10. UNIX for Beginners Questions & Answers
Hi,
I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below
test.fasta
>TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1... (1 Reply)
Discussion started by: dineshkumarsrk
1 Replies
LEARN ABOUT DEBIAN
kalign
KALIGN(1) Kalign User Manual KALIGN(1)
NAME
kalign - performs multiple alignment of biological sequences.
SYNOPSIS
kalign [infile.fasta] [outfile.fasta] [Options]
kalign [-i infile.fasta] [-o outfile.fasta] [Options]
kalign [< infile.fasta] [> outfile.fasta] [Options]
DESCRIPTION
Kalign is a command line tool to perform multiple alignment of biological sequences. It employs the Muth?Manber string-matching algorithm,
to improve both the accuracy and speed of the alignment. It uses global, progressive alignment approach, enriched by employing an
approximate string-matching algorithm to calculate sequence distances and by incorporating local matches into the otherwise global
alignment.
OPTIONS
-s -gpo -gapopen -gap_open x
Gap open penalty .
-e -gpe -gap_ext -gapextension x
Gap extension penalty.
-t -tgpe -terminal_gap_extension_penalty x
Terminal gap penalties.
-m -bonus -matrix_bonus x
A constant added to the substitution matrix.
-c -sort <input, tree, gaps.>
The order in which the sequences appear in the output alignment.
-g -feature
Selects feature mode and specifies which features are to be used: e.g. all, maxplp, STRUCT, PFAM-A?
-same_feature_score
Score for aligning same features.
-diff_feature_score
Penalty for aligning different features.
-d -distance <wu, pair>
Distance method
-b -tree -guide-tree <nj, upgma>
Guide tree method.
-z -zcutoff
Parameter used in the wu-manber based distance calculation.
-i -in -input
Name of the input file.
-o -out -output
Name of the output file.
-a -gap_inc
Increases gap penalties depending on the number of existing gaps.
-f -format <fasta, msf, aln, clu, macsim>
The output format.
-q -quiet
Print nothing to STDERR. Read nothing from STDIN.
REFERENCES
o Timo Lassmann and Erik L.L. Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics
6:298
o Timo Lassmann, Oliver Frings and Erik L. L. Sonnhammer (2009) Kalign2: high-performance multiple alignment of protein and nucleotide
sequences allowing external features. Nucleic Acid Research 3:858?865.
AUTHORS
Timo Lassmann <timolassmann@gmail.com>
Upstream author of Kalign.
Charles Plessy <plessy@debian.org>
Wrote the manpage.
COPYRIGHT
Copyright (C) 2004, 2005, 2006, 2007, 2008 Timo Lassmann
Kalign is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the
Free Software Foundation.
This manual page was written by Charles Plessy <plessy@debian.org> for the Debian(TM) system (but may be used by others). Permission is
granted to copy, distribute and/or modify this document under the same terms as kalign itself.
On Debian systems, the complete text of the GNU General Public License version 2 can be found in /usr/share/common-licenses/GPL-2.
kalign 2.04 February 25, 2009 KALIGN(1)