Is it possible to rename fasta headers based on its position specified in another file?
I have 5 sequences in a fasta file namely gene1.fasta as follows,
I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows,
The expected outcome should be like this,
Thanks in advance.
Last edited by dineshkumarsrk; 11-13-2019 at 03:07 AM..
Hi ,
I have a typical situation. I have 4 files and with different headers (number of headers is varible ).
I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only).
For example -
File 1
H1|H2|H3|H4
11|12|13|14
21|22|23|23... (1 Reply)
Hi Guys,
While I was writing one shell script , I just got struck at this point.
I need to extract words from a file at some specified position and do some comparison operation and need to replace the extracted word with another word.
Eg : I like Orange very much.
I need to replace... (19 Replies)
Hi,
I am new to unix. I want to delete 2 words placed at position say for example at 23rd and 45th position in a line. I used sed but couldnt achieve this.
Example: the file contains 2 lines
12345 98765 "12345" 876
12345 98765 "64578" 876
I want to delete " placed at position 13 and 19... (4 Replies)
I have a file with thousands of sequences that looks like this:
I need to replace the headers using a second file
Thus, I will end up having the following file:
I am looking for an AWK script that I can easily plug in my current pipeline.
Any help will be greatly appreciated! (6 Replies)
Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2.
Based on a post elsewhere, I found the code:
awk... (2 Replies)
Hi,
I am unable to find the right option to extract the data in the fixed width file.
sample data
abcd1234xgyhsyshijfkfk
hujk9876 io xgla
loki8787eljuwoejroiweo
dkfj9098 dja
Search based on position 8-9="xg" and print the entire row
output
... (4 Replies)
OS : Linux 2.6x
Shell : Korn
In a single file , how can I identify all the Uniqe values at a specific character position and length of each record ,
and simultaneously SPLIT the records of the file based on each of these values and write them in seperate files .
Lets say :
a) I want to... (4 Replies)
I have two files. File1 is shown below.
>153L:B|PDBID|CHAIN|SEQUENCE
RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL
KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM
DIGTTHDDYANDVVARAQYYKQHGY
>16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Hi,
I have a file with multiple lines(fixed width dat file). I want to search for '02' in the positions 45-46 and if available, in that lines, I need to replace value in position 359 with blank. As I am new to unix, I am not able to figure out how to do this. Can you please help me to achieve... (9 Replies)
Discussion started by: Pradhikshan
9 Replies
LEARN ABOUT DEBIAN
kalign
KALIGN(1) Kalign User Manual KALIGN(1)NAME
kalign - performs multiple alignment of biological sequences.
SYNOPSIS
kalign [infile.fasta] [outfile.fasta] [Options]
kalign [-i infile.fasta] [-o outfile.fasta] [Options]
kalign [< infile.fasta] [> outfile.fasta] [Options]
DESCRIPTION
Kalign is a command line tool to perform multiple alignment of biological sequences. It employs the Muth?Manber string-matching algorithm,
to improve both the accuracy and speed of the alignment. It uses global, progressive alignment approach, enriched by employing an
approximate string-matching algorithm to calculate sequence distances and by incorporating local matches into the otherwise global
alignment.
OPTIONS -s -gpo -gapopen -gap_open x
Gap open penalty .
-e -gpe -gap_ext -gapextension x
Gap extension penalty.
-t -tgpe -terminal_gap_extension_penalty x
Terminal gap penalties.
-m -bonus -matrix_bonus x
A constant added to the substitution matrix.
-c -sort <input, tree, gaps.>
The order in which the sequences appear in the output alignment.
-g -feature
Selects feature mode and specifies which features are to be used: e.g. all, maxplp, STRUCT, PFAM-A?
-same_feature_score
Score for aligning same features.
-diff_feature_score
Penalty for aligning different features.
-d -distance <wu, pair>
Distance method
-b -tree -guide-tree <nj, upgma>
Guide tree method.
-z -zcutoff
Parameter used in the wu-manber based distance calculation.
-i -in -input
Name of the input file.
-o -out -output
Name of the output file.
-a -gap_inc
Increases gap penalties depending on the number of existing gaps.
-f -format <fasta, msf, aln, clu, macsim>
The output format.
-q -quiet
Print nothing to STDERR. Read nothing from STDIN.
REFERENCES
o Timo Lassmann and Erik L.L. Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics
6:298
o Timo Lassmann, Oliver Frings and Erik L. L. Sonnhammer (2009) Kalign2: high-performance multiple alignment of protein and nucleotide
sequences allowing external features. Nucleic Acid Research 3:858?865.
AUTHORS
Timo Lassmann <timolassmann@gmail.com>
Upstream author of Kalign.
Charles Plessy <plessy@debian.org>
Wrote the manpage.
COPYRIGHT
Copyright (C) 2004, 2005, 2006, 2007, 2008 Timo Lassmann
Kalign is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the
Free Software Foundation.
This manual page was written by Charles Plessy <plessy@debian.org> for the Debian(TM) system (but may be used by others). Permission is
granted to copy, distribute and/or modify this document under the same terms as kalign itself.
On Debian systems, the complete text of the GNU General Public License version 2 can be found in /usr/share/common-licenses/GPL-2.
kalign 2.04 February 25, 2009 KALIGN(1)