Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Append file name to fasta file headers in Linux Post 302881966 by Mauve on Friday 3rd of January 2014 09:19:33 AM
Old 01-03-2014
Append file name to fasta file headers in Linux

How do we append the file name to fasta file headers in multiple fasta-files in Linux?

Last edited by Mauve; 01-05-2014 at 02:36 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merging of files with different headers to make combined headers file

Hi , I have a typical situation. I have 4 files and with different headers (number of headers is varible ). I need to make such a merged file which will have headers combined from all files (comman coluns should appear once only). For example - File 1 H1|H2|H3|H4 11|12|13|14 21|22|23|23... (1 Reply)
Discussion started by: marut_ashu
1 Replies

2. Shell Programming and Scripting

parse fasta file to tabular file

Hello, A bioperl problem I thought could be done with awk: convert the fasta format (Note: the length of each row is not the same for each entry as they were combined from different files!) to tabular format. input.fasta: >YAL069W-1.334 Putative promoter sequence... (6 Replies)
Discussion started by: yifangt
6 Replies

3. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

4. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies

5. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

6. UNIX for Advanced & Expert Users

Cannot find logical file format for BSD file headers.

Hi. Unix rookie here. Been looking for a few days for reference documents on how BSD UNIX lays the logical file format onto a disk. Goal is to view/edit with hex editor for data repair. Lots of docs are available for how to use Unix commands (like xxd), but I want to learn the map of how Unix... (4 Replies)
Discussion started by: Chris_top_he_r
4 Replies

7. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies

8. UNIX for Dummies Questions & Answers

Selectively extracting entries from FASTA file

I would like to extract all entries containing the following patterns: ccccta & ccccccccc from the following infile: >P39PT-1224_Freq_900 cccctacgacggcattggtaatggctcccgcaagccatctctcttcagccaagg >P39PT-784_Freq_2 cccctacgacggcattggtaatggcacccgcaagccatctctcttccccccccc >P39PT-678_Freq_5... (4 Replies)
Discussion started by: Xterra
4 Replies

9. UNIX for Beginners Questions & Answers

How to append two fasta files?

I have two fasta files as shown below, File:1 >Contig_1:90600-91187 AAGGCCATCAAGGACGTGGATGAGGTCGTCAAGGGCAAGGAACAGGAATTGATGACGGTC >Contig_98:35323-35886 GACGAAGCGCTCGCCAAGGCCGAAGAAGAAGGCCTGGATCTGGTCGAAATCCAGCCGCAG >Contig_24:26615-28387... (11 Replies)
Discussion started by: dineshkumarsrk
11 Replies

10. UNIX for Beginners Questions & Answers

Is it possible to rename fasta headers based on its position specified in another file?

I have 5 sequences in a fasta file namely gene1.fasta as follows, gene1.fasta >1256 ATGTAGC >GEP TAGAG >GTY578 ATGCATA >67_iga ATGCTGA >90_ld ATGCTG I need to rename the gene1.fasta file based on the sequence position specified in list.txt as follows, list.txt position1=org5... (5 Replies)
Discussion started by: dineshkumarsrk
5 Replies
KALIGN(1)							Kalign User Manual							 KALIGN(1)

NAME
kalign - performs multiple alignment of biological sequences. SYNOPSIS
kalign [infile.fasta] [outfile.fasta] [Options] kalign [-i infile.fasta] [-o outfile.fasta] [Options] kalign [< infile.fasta] [> outfile.fasta] [Options] DESCRIPTION
Kalign is a command line tool to perform multiple alignment of biological sequences. It employs the Muth?Manber string-matching algorithm, to improve both the accuracy and speed of the alignment. It uses global, progressive alignment approach, enriched by employing an approximate string-matching algorithm to calculate sequence distances and by incorporating local matches into the otherwise global alignment. OPTIONS
-s -gpo -gapopen -gap_open x Gap open penalty . -e -gpe -gap_ext -gapextension x Gap extension penalty. -t -tgpe -terminal_gap_extension_penalty x Terminal gap penalties. -m -bonus -matrix_bonus x A constant added to the substitution matrix. -c -sort <input, tree, gaps.> The order in which the sequences appear in the output alignment. -g -feature Selects feature mode and specifies which features are to be used: e.g. all, maxplp, STRUCT, PFAM-A? -same_feature_score Score for aligning same features. -diff_feature_score Penalty for aligning different features. -d -distance <wu, pair> Distance method -b -tree -guide-tree <nj, upgma> Guide tree method. -z -zcutoff Parameter used in the wu-manber based distance calculation. -i -in -input Name of the input file. -o -out -output Name of the output file. -a -gap_inc Increases gap penalties depending on the number of existing gaps. -f -format <fasta, msf, aln, clu, macsim> The output format. -q -quiet Print nothing to STDERR. Read nothing from STDIN. REFERENCES
o Timo Lassmann and Erik L.L. Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics 6:298 o Timo Lassmann, Oliver Frings and Erik L. L. Sonnhammer (2009) Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic Acid Research 3:858?865. AUTHORS
Timo Lassmann <timolassmann@gmail.com> Upstream author of Kalign. Charles Plessy <plessy@debian.org> Wrote the manpage. COPYRIGHT
Copyright (C) 2004, 2005, 2006, 2007, 2008 Timo Lassmann Kalign is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation. This manual page was written by Charles Plessy <plessy@debian.org> for the Debian(TM) system (but may be used by others). Permission is granted to copy, distribute and/or modify this document under the same terms as kalign itself. On Debian systems, the complete text of the GNU General Public License version 2 can be found in /usr/share/common-licenses/GPL-2. kalign 2.04 February 25, 2009 KALIGN(1)
All times are GMT -4. The time now is 04:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy