Select distinct sequences from fasta file and list
Hi
How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this:
I want to extract the sequences containing the motif FDCIR? Can it be done with grep? Or do I need a pearl script?
In a next step: How could I even extract sequences with respect to fullfilling two or more criteria?
Looking forward getting your suggestions.
Cheers, Marion.
Last edited by jim mcnamara; 09-24-2014 at 04:55 PM..
Hi, buddies out there.
I have a text file ( only one column ) which I created using vi editor. The file contains duplicate rows and I would like to select distinct rows, how to go on it using unix command:
file content =
apple
apple
orange
watermelon
apple
orange
Can it be done... (7 Replies)
Hi ,
I have a similar problem.
Please can anyone help me with a shell script or a perl.
I have a flat file like this
fruit country
apple germany
apple india
banana pakistan
banana saudi
mango india
I want to get a output like
fruit country
apple ... (7 Replies)
Hi, I have the following file:
LOG:015608::ERR:2310:map_spsrec:Invalid parameter
LOG:015608::ERR:2471:map_dgdrec:Invalid parameter
LOG:015608::ERR:2487:map_nnmrec:Invalid number
LOG:015608::ERR:2310:map_nmrec:Invalid number
LOG:015608::ERR:2438:map_nmrec:Invalid number
As a delimiter I... (2 Replies)
Hi,
I am having a file of dna sequences in fasta format which look like this:
>admin_1_45
atatagcaga
>admin_1_46
atatagcagaatatatat
with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
I have two files. File1 is shown below.
>153L:B|PDBID|CHAIN|SEQUENCE
RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL
KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM
DIGTTHDDYANDVVARAQYYKQHGY
>16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
I have a fasta file as follows
>sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3
MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN
TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM
KGVTSTRVYERA
>sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Hi,
I have a fasta file with multiple sequences. How can i get only unique sequences from the file.
For example
my_file.fasta
>seq1
TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC
>seq2... (3 Replies)
I could calculate the length of entire fasta sequences by following command,
awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta
But, I need to calculate the length of a particular fasta sequence specified/listed in another txt file. The results to to be... (14 Replies)
I have a fasta file as follows
>sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3
MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM
ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC
NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT... (3 Replies)
Hi,
I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below
test.fasta
>TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1... (1 Reply)
Discussion started by: dineshkumarsrk
1 Replies
LEARN ABOUT DEBIAN
srf2fastq
srf2fastq(1) Staden io_lib srf2fastq(1)NAME
srf2fastq - Converts SRF files to Sanger fastq format
SYNOPSIS
srf2fastq [options] srf_archive ...
DESCRIPTION
srf2fastq extracts sequences and qualities from one or more SRF archives and writes them in Sanger fastq format to stdout.
Note that Illumina also have a fastq format (used in the GERALD directories) which differs slightly in the use of log-odds scores for the
quality values. The format described here is using the traditional Phred style of quality encoding.
OPTIONS -c Outputs calibrated confidence values using the ZTR CNF1 chunk type for a single quality per base. Without this use the original
Illumina _prb.txt files consisting of four quality values per base, stored in the ZTR CNF4 chunks.
-C Masks out sequences tagged as bad quality.
-s root
Generates files on disk with filenames starting root, one file per non-explicit element in the SRF/ZTR region (REGN) chunk. Typi-
cally this results in two files for paired end runs. The filename suffixes come from the names listed in the SRF region chunks.
This option conflicts with the -S parameter.
-S Splits sequences into regions, but sequentially lists each sequence region to stdout instead of splitting to separate files on disk.
This option conflicts with the -s parameter.
-n When using -s the filename suffixes are simply numbered (starting with 1) instead of using the names listed in the SRF region
chunks.
-a Appends region index to the sequence names. Ie generate "name/1" and "name/2" for a paired read.
-e Include any explicit sequence (ZTR region chunk of type 'E') in the sequence output. The explicit sequence is also included in the
quality line too. Currently this is utilised by ABI SOLiD to store the last base of the primer.
-r region list
Reverse complements the sequence and reverses the quality values for all regions in the region list. This is a comma separated list
of integer values enumerating the regions, starting from 1. Note that this option only works when either -s or -S are specified.
EXAMPLES
To extract only the good quality sequences from all srf files in the current directory using calibrated confidence values (if available).
srf2fastq -c -C *.srf > runX.fastq
To extract a paired end run into two separate files with sequences named name/1 and name/2.
srf2fastq -s runX -a -n runX.srf
To extract a paired end run as a single file, alternating forward and reverse sequences, with the second read being reverse complemented.
srf2fastq -S -r 2 runX.srf > runX.fastq
AUTHOR
James Bonfield, Steven Leonard - Wellcome Trust Sanger Institute
December 10 srf2fastq(1)