Sponsored Content
Top Forums Shell Programming and Scripting Linguistic project: extract co-occurrences from text corpus Post 302661043 by figaro on Sunday 24th of June 2012 01:49:00 PM
Old 06-24-2012
Are you saying that if "big dog" appears 3 times or more in a given piece of text, it should return the number of occurrences, whereby the user provides the search word, in your example "dog"?
You speak of the period (".") as the delimiter, but you ultimately want to extend this to other punctuation as well, such as ! ? ; , etc?
This User Gave Thanks to figaro For This Post:
 

6 More Discussions You Might Find Interesting

1. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass... (7 Replies)
Discussion started by: kukretiabhi13
7 Replies

2. Shell Programming and Scripting

Text Substitution Project

History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here",... (7 Replies)
Discussion started by: dougp23
7 Replies

3. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and... (7 Replies)
Discussion started by: gimley
7 Replies

4. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the... (4 Replies)
Discussion started by: gimley
4 Replies

5. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text... (7 Replies)
Discussion started by: martinsmith
7 Replies

6. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is .english and in the Hindi one the tag is .Hindi The file may contain either a single text or more than one text... (7 Replies)
Discussion started by: gimley
7 Replies
STRTOK(3)						   BSD Library Functions Manual 						 STRTOK(3)

NAME
strtok, strtok_r -- string tokens LIBRARY
Standard C Library (libc, -lc) SYNOPSIS
#include <string.h> char * strtok(char * restrict str, const char * restrict sep); char * strtok_r(char *str, const char *sep, char **lasts); DESCRIPTION
The strtok() function is used to isolate sequential tokens in a nul-terminated string, str. These tokens are separated in the string by at least one of the characters in sep. The first time that strtok() is called, str should be specified; subsequent calls, wishing to obtain further tokens from the same string, should pass a null pointer instead. The separator string, sep, must be supplied each time, and may change between calls. The strtok() function returns a pointer to the beginning of each subsequent token in the string, after replacing the separator character itself with a NUL character. Separator characters at the beginning of the string or at the continuation point are skipped so that zero length tokens are not returned. When no more tokens remain, a null pointer is returned. The strtok_r() function implements the functionality of strtok() but is passed an additional argument, lasts, which points to a user-provided pointer which is used by strtok_r() to store state which needs to be kept between calls to scan the same string; unlike strtok(), it is not necessary to limit tokenizing to a single string at a time when using strtok_r(). EXAMPLES
The following will construct an array of pointers to each individual word in the string s: #define MAXTOKENS 128 char s[512], *p, *tokens[MAXTOKENS]; char *last; int i = 0; snprintf(s, sizeof(s), "cat dog horse cow"); for ((p = strtok_r(s, " ", &last)); p; (p = strtok_r(NULL, " ", &last)), i++) { if (i < MAXTOKENS - 1) tokens[i] = p; } tokens[i] = NULL; That is, tokens[0] will point to "cat", tokens[1] will point to "dog", tokens[2] will point to "horse", and tokens[3] will point to "cow". SEE ALSO
index(3), memchr(3), rindex(3), strchr(3), strcspn(3), strpbrk(3), strrchr(3), strsep(3), strspn(3), strstr(3) STANDARDS
The strtok() function conforms to ANSI X3.159-1989 (``ANSI C89''). The strtok_r() function conforms to IEEE Std 1003.1c-1995 (``POSIX.1''). BUGS
The System V strtok(), if handed a string containing only delimiter characters, will not alter the next starting point, so that a call to strtok() with a different (or empty) delimiter string may return a non-NULL value. Since this implementation always alters the next starting point, such a sequence of calls would always return NULL. BSD
August 11, 2002 BSD
All times are GMT -4. The time now is 04:36 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy