Post #7 probably did not work for you because there are excess trailing spaces that need to be removed in the file samples, but that are not going to be present in the actual FASTA files, see the note underneath...
--
Here they are without the spaces:
File1:
File2:
Of course another thing is the order that is mixed up when because it is undefined in the array structure in awk. That could easily be fixed of course if need be.
Last edited by Scrutinizer; 06-13-2019 at 06:12 PM..
This User Gave Thanks to Scrutinizer For This Post:
Hi, I have two files where 1 contains data and the other contains strings eg
file 1
-0.00000 0.00000 0.00000
0.00000 0.00000 0.80000
0.50000 0.50000 0.60000
0.50000 0.50000 0.20000
-0.00000 0.00000 0.40000
file 2
F F F
F F F
T T T
T T T
T T T
How to I append file2 to file 1 to... (1 Reply)
I would like to extract the sequences larger than 10 bases but shorter than 18 along with the identifier from a FASTA file that looks like this:
> Seq I
ACGACTAGACGATAGACGATAGA
> Seq 2
ACGATGACGTAGCAGT
> Seq 3
ACGATACGAT
I know I can extract the IDs alone with the following code
grep... (3 Replies)
I have a fasta file that looks like this:
>Noname
ACCAAAATAATTCATGATATACTCAGATCCATCTGAGGGTTTCACCACTTGTAGAGCTAT
CAGAAGAATGTCAATCAACTGTCCGAGAAAAAAGAATCCCAGG
>Noname
ACTATAAACCCTATTTCTCTTTCTAAAAATTGAAATATTAAAGAAACTAGCACTAGCCTG
ACCTTTAGCCAGACTTCTCACTCTTAATGCTGCGGACAAACAGA
...
I want to... (2 Replies)
I tried to write a script ( not working) to append first value from mylist to a file called my myfirstResult and to another called mysecondResult
awk ' {print $1} >> myfirsResult ' < mylist
awk ' {print $1} >> mysecondResult ' < mylist
$ cat mylist
A 02/16/2012
B 02/19/2012
C... (3 Replies)
Hey,
I've been trying to break a massive fasta formatted file into files containing each gene separately. Could anyone help me? I've tried to use the following code but i've recieved errors every time:
for i in *.rtf.out
do
awk '/^>/{f=++d".fasta"} {print > $i.out}' $i
done (1 Reply)
Hi All,
I have to append 2 lines at the end of a text file. If those 2 lines are already there then do not append else append the 2 lines to the text file.
Eg: I have a text file, file.txt
This text file might look like this,
/home/kp/make.jsp
/home/pk/model.jsp
I have to append... (1 Reply)
Hi frnds,
My requirement is I have a zip file with name say eg: test_ABC_UH_ccde2a_awdeaea_20150422.zip
within that there are subdirectories on each directory we again have .zip files and in that we have files like mama20150422.gz and so on.
Iam in need of a bash script so that it unzips... (0 Replies)
Hii,
Could someone help me to append string to the starting of all the filenames inside a directory but it should exclude .zip files and subdirectories.
Eg.
file1: test1.log
file2: test2.log
file3 test.zip
After running the script
file1: string_test1.log
file2: string_test2.log
file3:... (4 Replies)
AMPLICONNOISE(1) AmpliconNoise Documentation AMPLICONNOISE(1)NAME
AmpliconNoise - remove noise from high throughput nucleotide sequence data
VERSION
This documentation refers to version 1.22
SYNOPSIS
See /usr/share/doc/ampliconnoise/Doc.pdf.gz for details of how to run.
DESCRIPTION
The following tools are included. Most of them have an MPI equivalent, for example SeqNoise has an equivalent SeqNoiseM which can be used
with mpirun.
FastaUnique - dereplicates fasta file
-in string input file name
Options:
FCluster
-in string distance input file name
-out string output file stub
Options:
-r resolution
-a average linkage
-w use weights
-i read identifiers
-s scale dist.
NDist - pairwise Needleman-Wunsch sequence distance matrix from a fasta file
-in string fata file name
Options:
-i output identifiers
Perseus - slays monsters
-sin string seq file name
Options:
-tin string reference sequence file
-a output alignments
-d use imbalance
-rin string lookup file name
PyroDist - pairwise distance matrix from flowgrams
-in string flow file name
-out stub out file stub
Options:
-ni no index in dat file
-rin string lookup file name
PyroNoise - clusters flowgrams without alignments
-din string flow file name
-out string cluster input file name
-lin string list file
Options:
-v verbose
-c double initial cut-off
-ni no index in dat file
-s double precision
-rin file lookup file name
SeqDist - pairwise distance matrix from a fasta file
-in string fasta file name
Options:
-i output identifiers
-rin string lookup file name
SeqNoise - clusters sequences
-in string sequence file name
-din string distance matrix file name
-out string cluster input file name
-lin string list file
Options:
-min mapping file
-v verbose
-c double initial cut-off
-s double precision
-rin string lookup file name
SplitClusterEven
-din string dat filename
-min string map filename
-tin string tree filename
-s split size
-m min size
AUTHOR
All software by Chris Quince (quince@civil.gla.ac.uk) This manpage by Tim Booth (tbooth@ceh.ac.uk)
LICENCE AND COPYRIGHT
Copyright (c) 2009 (quince@civil.gla.ac.uk). All rights reserved.
Released under the Lesser GPL.
Permission is granted for anyone to copy, use, or modify these programs and documents for purposes of research or education, provided this
copyright notice is retained, and note is made of any changes that have been made.
perl v5.12.4 2011-04-28 AMPLICONNOISE(1)