Extracting words from file Post: 302537749

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extracting some words

i run a command that submits a word to WordNET which stores the search results in a document which looks like this... i searched "car" in this instance and id like to extract auto, automobile, machine, and store it in a file with the , , stripped away just the words. WordNET's results' template...

2. Shell Programming and Scripting

Extracting Text Between Two Words

Hi all! Im trying to extract a portion of text from a KML and put it into a new file. Im trying to get all of the points out of it, ignoring everything else so I need only the text between <Placement> and </Placement>. Is there a way to make it extract all instances of these points and not just...

3. Shell Programming and Scripting

Extracting part of line between two words

Hi, I have a file few hundred MB's with text like one below in single line. 20091117 abc xyg 20091117 def ghi 20091118 ppp ttt 20091118 zzz zzz xxx I need to extract part of line from 1st occurence of pattern 20091117 till first occurence of another pattern 20091118. I tried...

4. Shell Programming and Scripting

words extracting

Hi, Pls assist. dn: uid=test,ou=test,dc=com description: password sunIdentityServerDeviceStatus: Active uid: test objectClass: sunIdentityServerDevice objectClass: iplanet-am-user-service objectClass: top objectClass: iPlanetPreferences sunIdentityServerDeviceType: blabla cn: default...

5. UNIX for Dummies Questions & Answers

Extracting only words from a log file

hello: i have a file and i am trying to extract only unique words from that file. i used the command: cat messages.1 | tr " " "\n" | sort | uniq -c but using this command outputs everything unique in the file be it words, numbers, like all the characters..i need a command which will only...

6. Shell Programming and Scripting

Help with extracting words from fixed length files

I am very new to scripting and need to write a script that will extract the account number from a line that begins with HDR. For example, the file is as follows HDR2010072600300405505100726 00300405505 LBJ FREEWAY DALLAS TELEGRAPH ...

7. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Hello, I have a complex problem. I have a file in which words have been joined together: Theboy ranslowly I want to be able to correctly split the words using a lookup file in which all the words occur: the boy ran slowly slow put child ly The lookup file which is meant for look up...

8. Shell Programming and Scripting

grep - Extracting multiple key words from stdout

Hello. From command line, the command zypper info nxclient return a bloc of data : linux local # zypper info nxclient Loading repository data... Reading installed packages... Information for package nxclient: Repository: zypper_local Name: nxclient Version: 3.5.0-7 Arch: x86_64...

9. Shell Programming and Scripting

Extracting Words from Text

Hi there, Unix Gurus Back in September last year you helped me find a way to extract the words in brackets in a textfile to a new one. In that case my textfile was made up of sentences containing an only bracketed word per sentence/line: 1. If the boss's son had been , someone would...

10. Shell Programming and Scripting

Extracting words and lines based on keywords

Hello! I'm trying to process a text file and am stuck at 2 extractions. Hoping someone can help me here: 1. Given a line in a text file and given a keyword, how can I extract the word preceeding the keyword using a shell command/script? For example: Given a keyword "world" in the line: ...

LEARN ABOUT DEBIAN

bio::tools::seqwords

Bio::Tools::SeqWords(3pm)				User Contributed Perl Documentation				 Bio::Tools::SeqWords(3pm)

NAME

       Bio::Tools::SeqWords - Object holding n-mer statistics for a sequence

SYNOPSIS

	 # Create the SeqWords object, e.g.:

	 my $inputstream = Bio::SeqIO->new(-file => "seqfile",
						-format => 'Fasta');
	 my $seqobj = $inputstream->next_seq();
	 my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj);

	 # Or:
	 my $seqobj = Bio::PrimarySeq->new(-seq => "agggtttccc",
					   -alphabet => 'dna',
					   -id => 'test');
	 my $seq_word  =  Bio::Tools::SeqWords->new(-seq => $seqobj);

	 # obtain a hash of word counts, eg:
	 my $hash_ref = $seq_stats->count_words($word_length);

	 # display hash table, eg:
	 my %hash = %$hash_ref;
	 foreach my $key(sort keys %hash)
	 {
	   print "
$key	$hash{$key}";
	 }

	 # Or:

	 my $hash_ref =
	    Bio::Tools::SeqWords->count_words($seqobj,$word_length);

DESCRIPTION

       Bio::Tools::SeqWords is a featherweight object for the calculation of n-mer word occurrences in a single sequence.  It is envisaged that
       the object will be useful for construction of scripts which use n-mer word tables as the raw material for statistical calculations; for
       instance, hexamer frequency for the calculation of coding protential, or the calculation of periodicity in repetitive DNA.  Triplet
       frequency is already handled by Bio::Tools::SeqStats (author: Peter Schattner).

       There are a few possible applications for protein, e.g. hypothesised amino acid 7-mers in heat shock proteins, or proteins with multiple
       simple motifs.  Sometimes these protein periodicities are best seen when the amino acid alphabet is truncated, e.g. Shulman alphabet.
       Since there are quite a few of these shortened alphabets, this module does not specify any particular alphabet.

       See Synopsis above for object creation code.

   Rationale
       Take a sequence object and create an object for the purposes of holding n-mer word statistics about that sequence. The sequence can be
       nucleic acid or protein.

       In count_words() the words are counted in a non-overlapping manner, ie. in the style of a codon table, but with any word length.

       In count_overlap_words() the words are counted in an overlapping manner.

       For counts on opposite strand (DNA/RNA), a reverse complement method should be performed, and then the count repeated.

FEEDBACK

   Mailing Lists
       User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to one
       of the Bioperl mailing lists.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the problem with code and data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track the bugs and their resolution.  Bug reports can be submitted via the
       web:

	 https://redmine.open-bio.org/projects/bioperl/

AUTHOR

       Derek Gatherer, in the loosest sense of the word 'author'.  The general shape of the module is lifted directly from the SeqStat module of
       Peter Schattner. The central subroutine to count the words is adapted from original code provided by Dave Shivak, in response to a query on
       the bioperl mailing list.  At least 2 other people provided alternative means (equally good but not used in the end) of performing the same
       calculation.  Thanks to all for your assistance.

CONTRIBUTORS

       Jason Stajich, jason-at-bioperl.org

APPENDIX

       The rest of the documentation details each of the object methods.  Internal methods are usually preceded with a _

   count_words
	Title	: count_words
	Usage	: $word_count = $seq_stats->count_words($word_length)
		       or
		  $word_count = $seq_stats->Bio::Tools::SeqWords->($seqobj,$word_length);
	Function: Counts non-overlapping words within a string, any alphabet is
		  used
	Example : a sequence ACCGTCCGT, counted at word length 4, will give the hash
		  {ACCG => 1, TCCG => 1}
	Returns : Reference to a hash in which keys are words (any length) of the
		  alphabet used and values are number of occurrences of the word
		  in the sequence.
	Args	: Word length as scalar and, reference to sequence object if
		  required

		  Throws an exception word length is not a positive integer
		  or if word length is longer than the sequence.

   count_overlap_words
	Title	: count_overlap_words
	Usage	: $word_count = $word_obj->count_overlap_words($word_length);
	Function: Counts overlapping words within a string, any alphabet is used
	Example : A sequence ACCAACCA, counted at word length 4, will give the hash
		       {ACCA=>2, CCAA=>1, CAAC=>1, AACC=>1}
	Returns : Reference to a hash in which keys are words (any length) of the
		  alphabet used and values are number of occurrences of the word in
		  the sequence.
	Args	: Word length as scalar

		  Throws an exception if word length is not a positive integer
		  or if word length is longer than the sequence.

perl v5.14.2							    2012-03-02						 Bio::Tools::SeqWords(3pm)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extracting some words

Discussion started by: mark_nsx

2. Shell Programming and Scripting

Extracting Text Between Two Words

Discussion started by: Grizzly

3. Shell Programming and Scripting

Extracting part of line between two words

Discussion started by: artistic94555

4. Shell Programming and Scripting

words extracting

Discussion started by: hudson03051nh