Linguistic project: extract co-occurrences from text corpus Post: 302662111

6 More Discussions You Might Find Interesting

1. Programming

c program to extract text between two delimiters from some text file

needa c program to extract text between two delimiters from some text file. and then storing them in to diffrent variables ? text file like 0: abc.txt ========= aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass aaaaaa|11111111|sssssssssss|333333|ddddddddd|34343454564|asass...

2. Shell Programming and Scripting

Text Substitution Project

History: large open source PHP project, school management program. Comprises about 200 scripts. Had another developer for awhile, and he wanted a version in German, so he edited all the scripts and replaced text that would show up in the browser with variables (i.e. instead of "Click Here",...

3. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Hello, I have a large file of syllables /strings in Urdu. Each word is on a separate line. Example in English: be at for if being attract I need to identify the frequency of each of these strings from a large corpus (which I cannot attach unfortunately because of size limitations) and...

4. Shell Programming and Scripting

Grepping verbal forms from a large corpus

I want to extract verbal forms from a large corpus of English. I have identified a certain number of patterns. Each pattern has the following structure SPACE word_CATEGORY where word refers to the verbal form and CATEGORY refers to the class of the verb The categories are identified as per the...

5. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Hi folks! I have a file which contains a 1000 lines. On each line i have multiple occurrences ( 26 to be exact ) of pattern folder#/folder#. # is depicting the line number in the file some text here folder1/folder1 some text here folder1/folder1 some text here folder1/folder1 some text...

6. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

I have two directories called English and Hindi. Each directory contains the same number of files with the only difference being that in the case of the English Directory the tag is .english and in the Hindi one the tag is .Hindi The file may contain either a single text or more than one text...

LEARN ABOUT OPENDARWIN

uniq

UNIQ(1) 						    BSD General Commands Manual 						   UNIQ(1)

NAME

     uniq -- report or filter out repeated lines in a file

SYNOPSIS

     uniq [-c | -d | -u] [-i] [-f num] [-s chars] [input_file [output_file]]

DESCRIPTION

     The uniq utility reads the specified input_file comparing adjacent lines, and writes a copy of each unique input line to the output_file.	If
     input_file is a single dash ('-') or absent, the standard input is read.  If output_file is absent, standard output is used for output.  The
     second and succeeding copies of identical adjacent input lines are not written.  Repeated lines in the input will not be detected if they are
     not adjacent, so it may be necessary to sort the files first.

     The following options are available:

     -c      Precede each output line with the count of the number of times the line occurred in the input, followed by a single space.

     -d      Only output lines that are repeated in the input.

     -f num  Ignore the first num fields in each input line when doing comparisons.  A field is a string of non-blank characters separated from
	     adjacent fields by blanks.  Field numbers are one based, i.e. the first field is field one.

     -s chars
	     Ignore the first chars characters in each input line when doing comparisons.  If specified in conjunction with the -f option, the
	     first chars characters after the first num fields will be ignored.  Character numbers are one based, i.e. the first character is
	     character one.

     -u      Only output lines that are not repeated in the input.

     -i      Case insensitive comparison of lines.

DIAGNOSTICS

     The uniq utility exits 0 on success, and >0 if an error occurs.

COMPATIBILITY

     The historic +number and -number options have been deprecated but are still supported in this implementation.

SEE ALSO

     sort(1)

STANDARDS

     The uniq utility is expected to be IEEE Std 1003.2 (``POSIX.2'') compatible.

HISTORY

     A uniq command appeared in Version 3 AT&T UNIX.

BSD
								   June 6, 1993 							       BSD

6 More Discussions You Might Find Interesting

1. Programming

c program to extract text between two delimiters from some text file

Discussion started by: kukretiabhi13

2. Shell Programming and Scripting

Text Substitution Project

Discussion started by: dougp23

3. Shell Programming and Scripting

Creating Frequency of words from a file by accessing a corpus

Discussion started by: gimley

4. Shell Programming and Scripting

Grepping verbal forms from a large corpus

Discussion started by: gimley

5. Shell Programming and Scripting

Remove duplicate occurrences of text pattern

Discussion started by: martinsmith

6. Shell Programming and Scripting

Alignment tool to join text files in 2 directories to create a parallel corpus

Discussion started by: gimley

LEARN ABOUT OPENDARWIN

uniq