Sponsored Content
Top Forums Shell Programming and Scripting Linguistic project: extract co-occurrences from text corpus Post 302661043 by figaro on Sunday 24th of June 2012 01:49:00 PM
Old 06-24-2012
Are you saying that if "big dog" appears 3 times or more in a given piece of text, it should return the number of occurrences, whereby the user provides the search word, in your example "dog"?
You speak of the period (".") as the delimiter, but you ultimately want to extend this to other punctuation as well, such as ! ? ; , etc?
This User Gave Thanks to figaro For This Post:
 

6 More Discussions You Might Find Interesting

1. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass... (7 Replies)
Discussion started by: kukretiabhi13
7 Replies

2. Shell Programming and Scripting

Text Substitution Project

History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here",... (7 Replies)
Discussion started by: dougp23
7 Replies

3. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and... (7 Replies)
Discussion started by: gimley
7 Replies

4. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the... (4 Replies)
Discussion started by: gimley
4 Replies

5. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies

6. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is .english and in the Hindi one the tag is .Hindi The file may contain either a single text or more than one text... (7 Replies)
Discussion started by: gimley
7 Replies
FILTERS(6)							   Games Manual 							FILTERS(6)

NAME
ken, b1ff, censor, chef, cockney, eleet, fanboy, fudd, jethro, jibberish, jive, kenny, kraut, ky00te, nethack, newspeak, nyc, pirate, rasterman, scottish, spammer, scramble, studly, uniencode, upside-down - assorted text filters SYNOPSIS
$SHELL | chef newspeak < thesis.tex > newthesis.tex eleet | wall # b1ff works well too b1ff | ircII | censor DESCRIPTION
All of these programs are filters to do all sorts of strange things to text. No personal, racial, religious or societal slurs are intended. For amusement only. All the filters read input from stdin, change it, and write the filtered text to stdout. Some filters also support reading from files and writing to stdout. b1ff The B1FF filter cockney Cockney English chef convert English on stdin to Mock Swedish on stdout eleet K3wl hacker slang fanboy Speak like a fanboy. Filters out extraneous words and focuses on the words fans use. By default, it will speak like a fan of git/Linus/linux development. To change this, pass as parameters the words that the fanboy typically uses. Alternatively, pass the name of a topic that typically has fanboys to use a predefined word list. fudd Elmer Fudd jethro Hillbilly text filter jive Jive English jibberish Runs text through a random selection of the rest of the filters, to make really weird output. ken English into Cockney, featuring (dubious) rhyming slang for a lot of computer terminology. kraut Generates text with a bad German accent. kenny Generates text as spoken by Kenny on South Park. ky00te This program places a very cute (and familiar to FurryMuck fans) accent on any text file. nethackify Wiped out text like can be found in nethack. newspeak A-la-1984 censor CDA-ize text nyc Brooklyn English pirate Talk like a pirate. rasterman Makes text look like it came from the keyboard of Carsten Haitzler. scottish Fake scottish (dwarven) accent filter, inspired by the character "Durkon" from Order of the Stick. spammer Turns honest text into something that is liable to be flagged as spam. scramble Scramble the "inner" letters of each word in the input into a random order. The resulting text is still strangely readable. studly Studly caps. uniencode Use glorious unicode to the fullest possibile extent. As seen previously in many man pages. upside-down Flips text upside down. Stand on your head and squint to read the output. SEE ALSO
/usr/share/doc/filters/SAMPLES Lists samples of the output of all the filters. Other filters: pig From the bsdgames package, pig converts text to pig latin. dog --oog From the dog package, dog can also function as a filter, converting text to OOG-speak. AUTHORS
The eleet, upside-down, chef, b1ff, and censor filters were written by Joey Hess <joey@kitenet.net>. Daniel V Klein <dvk@lonewolf.com> wrote the cockney, jive, and nyc filters. jibberish is by Raul Miller <rdm@test.legislate.com>, jethro is by Duane Paulson <ci922@cleve- land.freenet.edu>, rasterman is by Zachary Beane, ken is by Stephen K Mulrine <skm@eqsn.net>, newspeak is by Jamie Zawinski <jwz@jwz.org>, studly is by Nick Phillips <nwp@lemon-computing.com>, Gurkan Sengun <gurkan@linuks.mine.nu> wrote nethackify, Dougal Campbell <dougal@gun- ters.org> wrote pirate, kraut is by John Sparks, scottish by Adam Borowski, Kenny is by Christian Garbs and Alan Eldridge, and scramble by Andrew J. Buehler. FILTERS(6)
All times are GMT -4. The time now is 01:03 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy