Sponsored Content
Top Forums Shell Programming and Scripting Linguistic project: extract co-occurrences from text corpus Post 302661037 by Scrutinizer on Sunday 24th of June 2012 01:43:03 PM
Old 06-24-2012
If you use "dog|dogs" as a field separator, then any resulting field would be adjacent to the word dog or dogs, if the number of fields is >1
This User Gave Thanks to Scrutinizer For This Post:
 

6 More Discussions You Might Find Interesting

1. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass... (7 Replies)
Discussion started by: kukretiabhi13
7 Replies

2. Shell Programming and Scripting

Text Substitution Project

History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here",... (7 Replies)
Discussion started by: dougp23
7 Replies

3. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and... (7 Replies)
Discussion started by: gimley
7 Replies

4. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the... (4 Replies)
Discussion started by: gimley
4 Replies

5. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies

6. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is .english and in the Hindi one the tag is .Hindi The file may contain either a single text or more than one text... (7 Replies)
Discussion started by: gimley
7 Replies
Text::Context::EitherSide(3pm)				User Contributed Perl Documentation			    Text::Context::EitherSide(3pm)

NAME
Text::Context::EitherSide - Get n words either side of search keywords SYNOPSIS
use Text::Context::EitherSide; my $text = "The quick brown fox jumped over the lazy dog"; my $context = Text::Context::EitherSide->new($text); $context->as_string("fox") # "... quick brown fox jumped over ..." $context->as_string("fox", "jumped") # "... quick brown fox jumped over the ..." my $context = Text::Context::EitherSide->new($text, context => 1); # 1 word on either side $context->as_string("fox", "jumped", "dog"); # "... brown fox jumped over ... lazy dog", Or, if you don't believe in all this OO rubbish: use Text::Context::EitherSide qw(get_context); get_context(1, $text, "fox", "jumped", "dog") # "... brown fox jumped over ... lazy dog" DESCRIPTION
Suppose you have a large piece of text - typically, say, a web page or a mail message. And now suppose you've done some kind of full-text search on that text for a bunch of keywords, and you want to display the context in which you found the keywords inside the body of the text. A simple-minded way to do that would be just to get the two words either side of each keyword. But hey, don't be too simple minded, because you've got to make sure that the list doesn't overlap. If you have the quick brown fox jumped over the lazy dog and you extract two words either side of "fox", "jumped" and "dog", you really don't want to end up with quick brown fox jumped over brown fox jumped over the the lazy dog so you need a small amount of smarts. This module has a small amount of smarts. EXPORTABLE
get_context This is primarily an object-oriented module. If you don't care about that, just import the "get_context" subroutine, and call it like so: get_context($num_of_words, $text, @words_to_find) and you'll get back a string with ellipses as in the synopsis. That's all that most people need to know. But if you want to do clever stuff... METHODS
new my $c = Text::Context::EitherSite->new($text [, context=> $n]); Create a new object storing some text to be searched, plus optionally some information about how many words on either side you want. (If you don't like the default of 2.) context $c->context(5); Allows you to get and set the number of the words on either side. as_sparse_list $c->as_sparse_list(@keywords) Returns the keywords, plus n words on either side, as a sparse list; the original text is split into an array of words, and non-contextual elements are replaced with "undef"s. (That's not actually how it works, but conceptually, it's the same.) as_list $c->as_list(@keywords) The same as "as_sparse_list", but single or multiple "undef"s are collapsed into a single ellipsis: (undef, "foo", undef, undef, undef, "bar") becomes ("...", "foo", "...", "bar") as_string $c->as_string(@keywords) Takes the "as_list" output above and joins them all together into a string. This is what most people want from "Text::Context::EitherSide". EXPORT "get_context" is available as a shortcut for Text::Context::EitherSide->new($text, context => $n)->as_string(@words); but needs to be explicitly imported. Nothing is exported by default. SEE ALSO
Text::Context is an even smarter way of extracting a contextual string. AUTHOR
Current maintainer: Tony Bowden Original author: Simon Cozens BUGS and QUERIES Please direct all correspondence regarding this module to: bug-Text-Context-EitherSide@rt.cpan.org COPYRIGHT AND LICENSE
Copyright 2002-2005 by Kasei Limited, http://www.kasei.com/ You may use and redistribute this module under the terms of the Artistic License 2.0. http://www.perlfoundation.org/artistic_license_2_0 perl v5.10.0 2009-05-04 Text::Context::EitherSide(3pm)
All times are GMT -4. The time now is 01:58 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy