06-24-2012
I'm sorry I wasn't really clear in my first post. In more concrete words, I'm trying to see with what word the word of my choice is most commonly associated with - on its left, that is to say: word wordofmychoice - within a corpus.
The textfile looks like this:
Are you one of those people who prefer larger dogs? Do you know someone who has told you that they prefer larger dogs because small dogs are yappy and snappy? Whether you are a large-dog person or a small-dog person, one thing we all would agree on is that a larger percentage of small dogs tend to have a different type of temperament than medium and large dogs. Small dogs have earned the reputation of being yappy, snappy, jealous, protective, wary of strangers and not the greatest child companions.
Let's say I'm interested in the word "dogs". The output would be:
larger dogs
small dogs
large dogs
But I want to count how many times each association appears:
larger dogs 2
small dogs 3
large dogs 1
And, I only want to keep (print in a new file) associations appearing at least 3 times. Therefore, the final result (in a new textfile) I want to obtain would be:
small dogs 3
That's it basically. If possible, now, but this is not a priority, I would like to make sure no association contain any punctuation in the middle, to avoid getting what I would call false results. For instance, let's say I'm looking for "small" and its associations (with one word on the left) in the previous text:
"dogs. Small"
This is what I want to avoid. But once again, that's not a priority.
Thanks for your answers guys, I hope it was a bit clearer
6 More Discussions You Might Find Interesting
1. Programming
needa c program to extract text between two delimiters from some text file.
and then storing them in to diffrent variables ?
text file like 0:
abc.txt
=========
aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass
aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass... (7 Replies)
Discussion started by: kukretiabhi13
7 Replies
2. Shell Programming and Scripting
History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here",... (7 Replies)
Discussion started by: dougp23
7 Replies
3. Shell Programming and Scripting
Hello,
I have a large file of syllables /strings in Urdu. Each word is on a separate line.
Example in English:
be
at
for
if
being
attract
I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and... (7 Replies)
Discussion started by: gimley
7 Replies
4. Shell Programming and Scripting
I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure
SPACE word_CATEGORY
where word refers to the verbal form and CATEGORY refers to the class of the verb
The categories are identified as per the... (4 Replies)
Discussion started by: gimley
4 Replies
5. Shell Programming and Scripting
Hi folks!
I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#.
# is depicting the line number in the file
some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies
6. Shell Programming and Scripting
I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is
.english
and in the Hindi one the tag is
.Hindi
The file may contain either a single text or more than one text... (7 Replies)
Discussion started by: gimley
7 Replies