Sponsored Content
Top Forums Shell Programming and Scripting Extract sequence from fasta file Post 302864055 by ctsgnb on Tuesday 15th of October 2013 06:51:29 PM
Old 10-15-2013
Very simple one:
Code:
awk '/X937/' RS='> ' input

...using variable:
Code:
awk -v P="$s" '$0~P' RS='> ' input

... adding leading RS:
Code:
awk -v P="$s" '!c++{printf RS}$0~P' RS='> ' input

Code:
$ s=X937
$ awk -v P="$s" '$0~P' RS='> ' input
fefrwefrwef X937
AGAGGGAATTGG
AGGAGACTTTAG
GGTTCTCTTC


Last edited by ctsgnb; 10-15-2013 at 08:00 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to extract a sequence of n lines from a file

Hi I want to be able to extract a sequence of n lines from a file. ideas, commands and suggestions would be highly appreciated. Thanks (4 Replies)
Discussion started by: 0ktalmagik
4 Replies

2. Shell Programming and Scripting

Extract Pattern Sequence

Dear Collegues I have to extract Some pattern from raw text file using perl The input will be raw text. Pattern to get - Sequence of Capital Letter Words ( e.g. he is working in Center for Perl Studies. He will come tomorrow...) from thos I have to extract sequences like "Center for Perl... (5 Replies)
Discussion started by: jaganadh
5 Replies

3. Shell Programming and Scripting

Extract sequence blocks

Hi, I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively: ASDFGHJKLQWERTYUIOPZXCVBNM first block: ASDFGHJKL 1nd block: SDFGHJKLQ What I have so far only gives me the first block, can anyone please explain why? cat... (7 Replies)
Discussion started by: solli
7 Replies

4. Shell Programming and Scripting

Parsing a fasta sequence with start and end coordinates

Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this? For Example Chr 1 is in following format I need regions from 2 - 10 should give me AATTCCAAA and in a similar way 15- 25 should give... (8 Replies)
Discussion started by: empyrean
8 Replies

5. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies

6. UNIX for Dummies Questions & Answers

Change sequence names in fasta file

I have fasta files with multiple sequences in each. I need to change the sequence name headers from: >accD:_59176-60699 ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA >atpA_(reverse_strand):_showing_revcomp_of_10525-12048 ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC... (2 Replies)
Discussion started by: tyrianthinae
2 Replies

7. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

8. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Hello, I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
Discussion started by: empyrean
5 Replies

9. Shell Programming and Scripting

Extract distinc sequence of letters

Hallo, I need to extract distinct sequence of letters for example from 136 to 193 Files are quite big, so I would prefer not to use "fold -w1" Thank you very much Input file look like this: 1 cttttacctt catgtgtttt tgcagatatt tgttcataat aacatcttct ttttaagtta 61 ttaaaatctt... (4 Replies)
Discussion started by: kamcamonty
4 Replies

10. UNIX for Beginners Questions & Answers

How to find a specific sequence pattern in a fasta file?

I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position, AAGCZ-N16-AAGCZ Z represents A, C or G (Except T) N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies
Locale::Codes::LangFam(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangFam(3pm)

NAME
Locale::Codes::LangFam - standard codes for language extension identification SYNOPSIS
use Locale::Codes::LangFam; $lext = code2langfam('apa'); # $lext gets 'Apache languages' $code = langfam2code('Apache languages'); # $code gets 'apa' @codes = all_langfam_codes(); @names = all_langfam_names(); DESCRIPTION
The "Locale::Codes::LangFam" module provides access to standard codes used for identifying language families, such as those as defined in ISO 639-5. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 639-5 language family codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language families. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lext = code2langfam('apa','alpha'); $lext = code2langfam('apa',LOCALE_LANGFAM_ALPHA); The codesets currently supported are: alpha This is the set of three-letter (lowercase) codes from ISO 639-5 such as 'apa' for Apache languages. This is the default code set. ROUTINES
code2langfam ( CODE [,CODESET] ) langfam2code ( NAME [,CODESET] ) langfam_code2code ( CODE ,CODESET ,CODESET2 ) all_langfam_codes ( [CODESET] ) all_langfam_names ( [CODESET] ) Locale::Codes::LangFam::rename_langfam ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangFam::add_langfam ( CODE ,NAME [,CODESET] ) Locale::Codes::LangFam::delete_langfam ( CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_alias ( NAME ,NEW_NAME ) Locale::Codes::LangFam::delete_langfam_alias ( NAME ) Locale::Codes::LangFam::rename_langfam_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::add_langfam_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangFam::delete_langfam_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.loc.gov/standards/iso639-5/id.php ISO 639-5 . AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2013-11-04 Locale::Codes::LangFam(3pm)
All times are GMT -4. The time now is 05:05 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy