Text analysis Post: 302508963

8 More Discussions You Might Find Interesting

1. Solaris

Catalina Analysis

How can I make analysis for catalina.out

2. Shell Programming and Scripting

AWK script: decrypt text uses frequency analysis

Ez all! I have a question how to decrypt text uses letter frequency analysis. I have code which count the letters, but what i need to do after that. Can anybody help me to write a code. VERY NEEDED! My code now: #!/usr/bin/awk -f BEGIN { FS="" } { for (i=1; i <= NF; i++) { if ($i...

3. Programming

Regarding stack analysis

I would like to know how I could do the following : void func(){ int a = 100; b=0; int c = a/b; } void sig_handler (int sig,siginfo_t *info,void *context){ //signal handling function //here I want to access the variables of func() } int main(){ struct sigaction *act =...

4. Shell Programming and Scripting

text file analysis

Hello, I have a text file containin 4 lines which are repeated along the file, ie the file looks like this: 16:20:12.060769 blablabla 40 16:20:12.093199 blablabla 640 16:20:12.209003 blablabla 640 16:20:12.273179 blablabla 216 16:20:27.217444 blablabla 40 16:20:27.235410 blablabla 640...

5. Shell Programming and Scripting

Metacharacters analysis

:confused:Hi , Can someone please advise what is the meaning of metacharacters in below code? a_PROCESS=${0##*/} a_DPFX=${a_PROCESS%.*} a_LPFX="a_DPFX : $$ : " a_UPFX="Usage: $a_PROCESS" Regards, gehlnar

6. Shell Programming and Scripting

Analysis of a script

what does this line in a script mean?? I have tried to give it at the command prompt and here is what it returns ksh: /db2home/db2dap1/sqllib/db2profile: not found. . /db2home/db2dap1/sqllib/db2profile i have tried the same thing for my home directory too and the result is the same ....

7. UNIX for Dummies Questions & Answers

Help with text analysis - UNIX

Hey Guys I recently posted yesterday about trying to count the amount of separate words that exists in a text file e.g. walle.txt. i want the output to give to give me a list of words with a number next indicating how many times its came up in the file e.g: cat 20 the 11 if 40 I'm...

8. Infrastructure Monitoring

Nmon Analysis

Dear All, I am an performance tester. Now i am working in project where we are using linux 2.6.32. Now I got an oppurtunity to learn the monitoring the server. As part of this task i need to do analysis of the Nmon report. I was completely blank in this. So please suggest me how to start...

LEARN ABOUT DEBIAN

text::ngram

Text::Ngram(3pm)					User Contributed Perl Documentation					  Text::Ngram(3pm)

NAME

       Text::Ngram - Ngram analysis of text

SYNOPSIS

	 use Text::Ngram qw(ngram_counts add_to_counts);
	 my $text   = "abcdefghijklmnop";
	 my $hash_r = ngram_counts($text, 3); # Window size = 3
	 # $hash_r => { abc => 1, bcd => 1, ... }

	 add_to_counts($more_text, 3, $hash_r);

DESCRIPTION

       n-Gram analysis is a field in textual analysis which uses sliding window character sequences in order to aid topic analysis, language
       determination and so on. The n-gram spectrum of a document can be used to compare and filter documents in multiple languages, prepare word
       prediction networks, and perform spelling correction.

       The neat thing about n-grams, though, is that they're really easy to determine. For n=3, for instance, we compute the n-gram counts like
       so:

	   the cat sat on the mat
	   ---			   $counts{"the"}++;
	    --- 		   $counts{"he "}++;
	     ---		   $counts{"e c"}++;
	      ...

       This module provides an efficient XS-based implementation of n-gram spectrum analysis.

       There are two functions which can be imported:

   ngram_counts
       This first function returns a hash reference with the n-gram histogram of the text for the given window size. The default window size is 5.

	   $href = ngram_counts(\%config, $text, $window_size);

       As of version 0.14, the %config may instead be passed in as named arguments:

	   $href = ngram_counts($text, $window_size, %config);

       The only necessary parameter is $text.

       The possible value for %config are:

       flankbreaks

       If set to 1 (default), breaks are flanked by spaces; if set to 0, they're not. Breaks are punctuation and other non-alphabetic characters,
       which, unless you use "punctuation => 0" in your configuration, do not make it into the returned hash.

       Here's an example, supposing you're using the default value for punctuation(1):

	 my $text = "Hello, world";
	 my $hash = ngram_counts($text, 5);

       That produces the following ngrams:

	 {
	   'Hello' => 1,
	   'ello ' => 1,
	   ' worl' => 1,
	   'world' => 1,
	 }

       On the other hand, this:

	 my $text = "Hello, world";
	 my $hash = ngram_counts({flankbreaks => 0}, $text, 5);

       Produces the following ngrams:

	 {
	   'Hello' => 1,
	   ' worl' => 1,
	   'world' => 1,
	 }

       lowercase

       If set to 0, casing is preserved. If set to 1, all letters are lowercased before counting ngrams. Default is 1.

	   # Get all ngrams of size 4 preserving case
	   $href_p = ngram_counts( {lowercase => 0}, $text, 4 );

       punctuation

       If set to 0 (default), punctuation is removed before calculating the ngrams.  Set to 1 to preserve it.

	   # Get all ngrams of size 2 preserving punctuation
	   $href_p = ngram_counts( {punctuation => 1}, $text, 2 );

       spaces

       If set to 0 (default is 1), no ngrams containing spaces will be returned.

	  # Get all ngrams of size 3 that do not contain spaces
	  $href = ngram_counts( {spaces => 0}, $text, 3);

       If you're going to request both types of ngrams, than the best way to avoid calculating the same thing twice is probably this:

	   $href_with_spaces = ngram_counts($text[, $window]);
	   $href_no_spaces = $href_with_spaces;
	   for (keys %$href_no_spaces) { delete $href->{$_} if / / }

   add_to_counts
       This incrementally adds to the supplied hash; if $window is zero or undefined, then the window size is computed from the hash keys.

	   add_to_counts($more_text, $window, $href)

TO DO

       o     Look further into the tests. Sort them and add more.

SEE ALSO

       Cavnar, W. B.(1993). N-gram-based text filtering for TREC-2. In D.  Harman (Ed.), Proceedings of TREC-2: Text Retrieval Conference 2.
       Washington, DC: National Bureau of Standards.

       Shannon, C. E.(1951). Predication and entropy of printed English.  The Bell System Technical Journal, 30. 50-64.

       Ullmann, J. R.(1977). Binary n-gram technique for automatic correction of substitution, deletion, insert and reversal errors in words.
       Computer Journal, 20. 141-147.

AUTHOR

       Maintained by Alberto Simoes, "ambs@cpan.org".

       Previously maintained by Jose Castro, "cog@cpan.org".  Originally created by Simon Cozens, "simon@cpan.org".

COPYRIGHT AND LICENSE

       Copyright 2006 by Alberto Simoes

       Copyright 2004 by Jose Castro

       Copyright 2003 by Simon Cozens

       This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

perl v5.14.2							    2012-01-25							  Text::Ngram(3pm)