Sponsored Content
Top Forums Shell Programming and Scripting Using perl or awk to create ngrams Post 302863213 by gimley on Sunday 13th of October 2013 04:55:06 AM
Old 10-13-2013
I guess I did the analysis manually and hence slipped up.
I agree traditional ngrams work the way you have defined, but I am interested in contextual ngrams in which the frequency of occurrence of a given string is determined by its immediate context.
Since the analysis is at a micro-level and not a macrol-evel, such NGrams can be used for predicting whether a given string complies with the training data and witha few additional tweaks even suggest a valid structure.
I hope I have made the idea clear and why the analysis in terms of context driven Ngrams is slightly different.
Many thanks for your response
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

create a directory in perl

Hi Guys!!!!!!!!!!!!!!!!!!!!! can we create or copy directories in perl without using system commands like "mkdir" and "cp" script needed urgent !!!!!!!!!!!!!!!!!!!!!!!!!!! cheers, aajan (7 Replies)
Discussion started by: aajan
7 Replies

2. Shell Programming and Scripting

create an user in perl

hi friends, i want to create an new user in my home directory , only just for checking. if it is possible, please help me. thanks (1 Reply)
Discussion started by: praneshbmishra
1 Replies

3. AIX

create widgets using perl pk module

hi I am posting this for my friend... is it possible to create widgets using perl pk module in IBM AIX 5.3? They dont have a GUI so is it possible to create the above mentioned thing in a CUI? thanks! Sathish (1 Reply)
Discussion started by: sathumenon
1 Replies

4. Shell Programming and Scripting

I need help to create a file using Perl

Hi, i have some files in text format and i want to create a file with all the information in the others files, but i don't want copy all the information exactly i just need the information from the fourth line to the end of file I will try to explain with an example: file1.txt abc abc... (1 Reply)
Discussion started by: romanhr
1 Replies

5. Shell Programming and Scripting

how to create a file in perl

hey gurus! i m a perl newbie!! i want to create an empty file and also directory in perl... how to print a msg if the present working directory has ".db" extension. like in shell if ] ; then echo "hello " i want to do this in perl!! please help.. (4 Replies)
Discussion started by: tprayush
4 Replies

6. Shell Programming and Scripting

How to create hash dynamically in perl?

Hi, I have one file name file.txt It has the following contents: #File Contents StartTime,EndTime,COUNTER1,COUNTER2,COUNTER3 12:13,12:14,0,1,0 The output should be like this: StartTime: 12:13 ENDTIME: 12:14 (2 Replies)
Discussion started by: vanitham
2 Replies

7. Shell Programming and Scripting

Create an XML tree using perl

Hi, I am having an xml file which looks like this: <Nodes> <Node> <Nodename>Student</Nodename> <Filename>1.txt</filename> <Node> <Nodename>Dummy</Nodename> <Filename>22.txt</filename> </Node> </Node> </Nodes> The text files will have data like this: #1.txt... (8 Replies)
Discussion started by: vanitham
8 Replies

8. Programming

perl script to create hash.

Hi, I have the xml file file this, perl script to create hash<p> <university> <name>svu</name> <location>ravru</location> <branch> <electronics> <student name="xxx" number="12"> <semester number="1"subjects="7" rank="2"/> </student> <student name="xxx"... (1 Reply)
Discussion started by: veerubiji
1 Replies

9. Shell Programming and Scripting

Eliminating words from a file through ngrams stored in another file

Hello, I have a large data file which contains a huge amount of garbage i.e. words which do not exist in the language. An example will make this clear: kpaware nlupset rrrbring In other words these words are invalid in English and constitute garbage in the data. I have identified such... (2 Replies)
Discussion started by: gimley
2 Replies

10. Shell Programming and Scripting

awk to create variables to pass into a bash loop to create a download link

I have created one file that contains all the necessary info in it to create a download link. In each of the lines /results/analysis/output/Home/Auto_user_S5-00580-6-Medexome_67_032/plugin_out/FileExporter_out.67... (8 Replies)
Discussion started by: cmccabe
8 Replies
Text::Ngram(3pm)					User Contributed Perl Documentation					  Text::Ngram(3pm)

NAME
Text::Ngram - Ngram analysis of text SYNOPSIS
use Text::Ngram qw(ngram_counts add_to_counts); my $text = "abcdefghijklmnop"; my $hash_r = ngram_counts($text, 3); # Window size = 3 # $hash_r => { abc => 1, bcd => 1, ... } add_to_counts($more_text, 3, $hash_r); DESCRIPTION
n-Gram analysis is a field in textual analysis which uses sliding window character sequences in order to aid topic analysis, language determination and so on. The n-gram spectrum of a document can be used to compare and filter documents in multiple languages, prepare word prediction networks, and perform spelling correction. The neat thing about n-grams, though, is that they're really easy to determine. For n=3, for instance, we compute the n-gram counts like so: the cat sat on the mat --- $counts{"the"}++; --- $counts{"he "}++; --- $counts{"e c"}++; ... This module provides an efficient XS-based implementation of n-gram spectrum analysis. There are two functions which can be imported: ngram_counts This first function returns a hash reference with the n-gram histogram of the text for the given window size. The default window size is 5. $href = ngram_counts(\%config, $text, $window_size); As of version 0.14, the %config may instead be passed in as named arguments: $href = ngram_counts($text, $window_size, %config); The only necessary parameter is $text. The possible value for %config are: flankbreaks If set to 1 (default), breaks are flanked by spaces; if set to 0, they're not. Breaks are punctuation and other non-alphabetic characters, which, unless you use "punctuation => 0" in your configuration, do not make it into the returned hash. Here's an example, supposing you're using the default value for punctuation(1): my $text = "Hello, world"; my $hash = ngram_counts($text, 5); That produces the following ngrams: { 'Hello' => 1, 'ello ' => 1, ' worl' => 1, 'world' => 1, } On the other hand, this: my $text = "Hello, world"; my $hash = ngram_counts({flankbreaks => 0}, $text, 5); Produces the following ngrams: { 'Hello' => 1, ' worl' => 1, 'world' => 1, } lowercase If set to 0, casing is preserved. If set to 1, all letters are lowercased before counting ngrams. Default is 1. # Get all ngrams of size 4 preserving case $href_p = ngram_counts( {lowercase => 0}, $text, 4 ); punctuation If set to 0 (default), punctuation is removed before calculating the ngrams. Set to 1 to preserve it. # Get all ngrams of size 2 preserving punctuation $href_p = ngram_counts( {punctuation => 1}, $text, 2 ); spaces If set to 0 (default is 1), no ngrams containing spaces will be returned. # Get all ngrams of size 3 that do not contain spaces $href = ngram_counts( {spaces => 0}, $text, 3); If you're going to request both types of ngrams, than the best way to avoid calculating the same thing twice is probably this: $href_with_spaces = ngram_counts($text[, $window]); $href_no_spaces = $href_with_spaces; for (keys %$href_no_spaces) { delete $href->{$_} if / / } add_to_counts This incrementally adds to the supplied hash; if $window is zero or undefined, then the window size is computed from the hash keys. add_to_counts($more_text, $window, $href) TO DO
o Look further into the tests. Sort them and add more. SEE ALSO
Cavnar, W. B.(1993). N-gram-based text filtering for TREC-2. In D. Harman (Ed.), Proceedings of TREC-2: Text Retrieval Conference 2. Washington, DC: National Bureau of Standards. Shannon, C. E.(1951). Predication and entropy of printed English. The Bell System Technical Journal, 30. 50-64. Ullmann, J. R.(1977). Binary n-gram technique for automatic correction of substitution, deletion, insert and reversal errors in words. Computer Journal, 20. 141-147. AUTHOR
Maintained by Alberto Simoes, "ambs@cpan.org". Previously maintained by Jose Castro, "cog@cpan.org". Originally created by Simon Cozens, "simon@cpan.org". COPYRIGHT AND LICENSE
Copyright 2006 by Alberto Simoes Copyright 2004 by Jose Castro Copyright 2003 by Simon Cozens This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.14.2 2012-01-25 Text::Ngram(3pm)
All times are GMT -4. The time now is 03:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy