Sponsored Content
Top Forums UNIX for Beginners Questions & Answers How to count the length of fasta sequences? Post 303033626 by RavinderSingh13 on Tuesday 9th of April 2019 09:24:20 AM
Old 04-09-2019
Quote:
Originally Posted by dineshkumarsrk
Thank you singh,
Your command prints all the sequences. However, I need to print only few sequences length as listed in id.txt file. If I did not understand your commands properly, please let me know, where to include id.txt file in your command?
Oh ok, I was in impression that you want to print all seq strings length in a single Input_file, could you please try following now.
Code:
awk 'FNR==NR{a[$0];next} /^>/ && sub(/^>/,""){;found=val="";if($0 in a){val=$0;found=1};next} found{print val,length($0)} ' ids.txt  Input_file

Output will be as follows.
Code:
seq1 6
seq2 7

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

2. Shell Programming and Scripting

Shell script for changing the accession number of DNA sequences in a FASTA file

Hi, I am having a file of dna sequences in fasta format which look like this: >admin_1_45 atatagcaga >admin_1_46 atatagcagaatatatat with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Discussion started by: margarita
5 Replies

3. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

4. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Hello, I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
Discussion started by: empyrean
5 Replies

5. Shell Programming and Scripting

Shorten header of protein sequences in fasta file

I have a fasta file as follows >sp|O15090|FABP4_HUMAN Fatty acid-binding protein, adipocyte OS=Homo sapiens GN=FABP4 PE=1 SV=3 MCDAFVGTWKLVSSENFDDYMKEVGVGFATRKVAGMAKPNMIISVNGDVITIKSESTFKN TEISFILGQEFDEVTADDRKVKSTITLDGGVLVHVQKWDGKSTTIKRKREDDKLVVECVM KGVTSTRVYERA >sp|L18484|AP2A2_RAT AP-2... (3 Replies)
Discussion started by: alexypaul
3 Replies

6. UNIX for Dummies Questions & Answers

Select distinct sequences from fasta file and list

Hi How can I extract sequences from a fasta file with respect a certain criteria? The beginning of my file (containing in total more than 1000 sequences) looks like this: >H8V34IS02I59VP SDACNDLTIALLQIAREVRVCNPTFSFRWHPQVKDEVMRECFDCIRQGLG YPSMRNDPILIANCMNWHGHPLEEARQWVHQACMSPCPSTKHGFQPFRMA... (6 Replies)
Discussion started by: Marion MPI
6 Replies

7. Shell Programming and Scripting

Getting unique sequences from multiple fasta file

Hi, I have a fasta file with multiple sequences. How can i get only unique sequences from the file. For example my_file.fasta >seq1 TCTCAAAGAAAGCTGTGCTGCATACTGTACAAAACTTTGTCTGGAGAGATGGAGAATCTCATTGACTTTACAGGTGTGGACGGTCTTCAGAGATGGCTCAAGCTAACATTCCCTGACACACCTATAGGGAAAGAGCTAAC >seq2... (3 Replies)
Discussion started by: Ibk
3 Replies

8. Shell Programming and Scripting

Outputting sequences based on length with sed

I have this file: >ID1 AA >ID2 TTTTTT >ID-3 AAAAAAAAA >ID4 TTTTTTGGAGATCAGTAGCAGATGACAG-GGGGG-TGCACCCC Add I am trying to use this script to output sequences longer than 15 characters: sed -r '/^>/N;{/^.{,15}$/d}' The desire output would be this: >ID4... (8 Replies)
Discussion started by: Xterra
8 Replies

9. Shell Programming and Scripting

Shorten header of protein sequences in fasta file to only organism name

I have a fasta file as follows >sp|Q8WWQ8|STAB2_HUMAN Stabilin-2 OS=Homo sapiens OX=9606 GN=STAB2 PE=1 SV=3 MMLQHLVIFCLGLVVQNFCSPAETTGQARRCDRKSLLTIRTECRSCALNLGVKCPDGYTM ITSGSVGVRDCRYTFEVRTYSLSLPGCRHICRKDYLQPRCCPGRWGPDCIECPGGAGSPC NGRGSCAEGMEGNGTCSCQEGFGGTACETCADDNLFGPSCSSVCNCVHGVCNSGLDGDGT... (3 Replies)
Discussion started by: jerrild
3 Replies

10. UNIX for Beginners Questions & Answers

How to add specific bases at the beginning and ending of all the fasta sequences?

Hi, I have to add 7 bases of specific nucleotide at the beginning and ending of all the fasta sequences of a file. For example, I have a multi fasta file namely test.fasta as given below test.fasta >TalAA18_Xoo_CIAT_NZ_CP033194.1:_2936369-2939570:+1... (1 Reply)
Discussion started by: dineshkumarsrk
1 Replies
Grinder::KmerCollection(3pm)				User Contributed Perl Documentation			      Grinder::KmerCollection(3pm)

NAME
Grinder::KmerCollection - A collection of kmers from sequences SYNOPSIS
my $col = Grinder::KmerCollection->new( -k => 10, -file => 'seqs.fa' ); DESCRIPTION
Manage a collection of kmers found in various sequences. Store information about what sequence a kmer was found in and its starting position on the sequence. AUTHOR
Florent Angly <florent.angly@gmail.com> APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ new Title : new Usage : my $col = Grinder::KmerCollection->new( -k => 10, -file => 'seqs.fa', -revcom => 1 ); Function: Build a new kmer collection Args : -k set the kmer length (default: 10 bp) -revcom count kmers before and after reverse-complementing sequences (default: 0) -seqs count kmers in the provided arrayref of sequences (Bio::Seq objects) -ids if specified, index the sequences provided to -seq using the use the IDs in this arrayref instead of using the sequences $seq->id() method -file count kmers in the provided file of sequences -weights if specified, assign the abundance of each sequence from the values in this arrayref Returns : Grinder::KmerCollection object k Usage : $col->k; Function: Get the length of the kmers Args : None Returns : Positive integer weights Usage : $col->weights({'seq1' => 3, 'seq10' => 0.45}); Function: Get or set the weight of each sequence. Each sequence is given a weight of 1 by default. Args : hashref where the keys are sequence IDs and the values are the weight of the corresponding (e.g. their relative abundance) Returns : Grinder::KmerCollection object collection_by_kmer Usage : $col->collection_by_kmer; Function: Get the collection of kmers, indexed by kmer Args : None Returns : A hashref of hashref of arrayref: hash->{kmer}->{ID of sequences with this kmer}->[starts of kmer on sequence] collection_by_seq Usage : $col->collection_by_seq; Function: Get the collection of kmers, indexed by sequence ID Args : None Returns : A hashref of hashref of arrayref: hash->{ID of sequences with this kmer}->{kmer}->[starts of kmer on sequence] add_file Usage : $col->add_file('seqs.fa'); Function: Process the kmers in the given file of sequences. Args : filename Returns : Grinder::KmerCollection object add_seqs Usage : $col->add_seqs([$seq1, $seq2]); Function: Process the kmers in the given sequences. Args : * arrayref of Bio::Seq objects * arrayref of IDs to use for the indexing of the sequences Returns : Grinder::KmerCollection object filter_rare Usage : $col->filter_rare( 2 ); Function: Remove kmers occurring at less than the (weighted) abundance specified Args : integer Returns : Grinder::KmerCollection object filter_shared Usage : $col->filter_shared( 2 ); Function: Remove kmers occurring in less than the number of sequences specified Args : integer Returns : Grinder::KmerCollection object counts Usage : $col->counts Function: Calculate the total count of each kmer. Counts are affected by the weights you gave to the sequences. Args : * restrict sequences to search to specified sequence ID (optional) * starting position from which counting should start (optional) * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of the different kmers * arrayref of the corresponding total counts sources Usage : $col->sources() Function: Return the sources of a kmer and their (weighted) abundance. Args : * kmer to get the sources of * sources to exclude from the results (optional) * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of the different sources * arrayref of the corresponding total counts If the kmer requested does not exist, the array will be empty. kmers Usage : $col->kmers('seq1'); Function: This is the inverse of sources(). Return the kmers found in a sequence (given its ID) and their (weighted) abundance. Args : * sequence ID to get the kmers of * 0 to report counts (default), 1 to report frequencies (normalize to 1) Returns : * arrayref of sequence IDs * arrayref of the corresponding total counts If the sequence ID requested does not exist, the arrays will be empty. positions Usage : $col->positions() Function: Return the positions of the given kmer on a given sequence. An error is reported if the kmer requested does not exist Args : * desired kmer * desired sequence with this kmer Returns : Arrayref of the different positions. The arrays will be empty if the desired combination of kmer and sequence was not found. perl v5.14.2 2012-01-17 Grinder::KmerCollection(3pm)
All times are GMT -4. The time now is 12:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy