GETWORDFREQ(1) User Contributed Perl Documentation GETWORDFREQ(1)NAME
getWordFreq - print word freq information from language model
SYNOPSIS
getWordFreq [option]... -m slm-file -l lexicon
DESCRIPTION
getWordFreq prints out the word string and its freq of all words in a language model.
OPTIONS -s corpus-size Specify the training corpus's size. The default corpus-size is 300000000 if not given.
-v Be verbose, output other information after word and freq for each line.
-e Give format for ervin.
-m slm-file
Specify language model file.
-l lexicon
Specify the lexicon file. A default lexicon could be found at /usr/share/sunpinyin-slm/dict.utf8.
AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>.
SEE ALSO slmthread(1).
perl v5.14.2 2012-06-09 GETWORDFREQ(1)
Check Out this Related Man Page
SLMSEG(1) User Contributed Perl Documentation SLMSEG(1)NAME
slmseg - maximum matching segment Chinese text.
SYNOPSIS
slmseg -d dict_file [option]... [corpus_file]...
DESCRIPTION
slmseg is a tool for segmenting Chinese text into words using maximum matching algorithm. slmseg segments corpus_file, or standard input if
no filename is specified, and write the segmented result to standard output.
OPTIONS -d dict_file
Use dict_file as lexicon. A default lexicon can be found at /usr/share/sunpinyin-slm/dict.utf8.
-f,--format (text|bin)
Output Format, can be 'text' or 'bin'. default 'bin'. Normally, in text mode, word text are output, while in binary mode, binary short
integer of the word-ids are written to stdout.
-s, --stok STOK_ID
Sentence token id. Default 10. It will be written to output in binary mode after every sentence.
-i, --show-id
Show Id info. Under text output format mode, attach id after known words. If under binary mode, print id(s) in text.
-m, --model language-model-file Speficy the language model file. This file is always generated by slmthread.
NOTES
Under binary mode, consecutive id of 0 are merged into one 0. Under text mode, no space are inserted between unknown-words.
AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>.
SEE ALSO mmseg(1), ids2ngram (1).
perl v5.14.2 2012-06-09 SLMSEG(1)
I have a huge matrix file containing some 1.5 million rows and 6000 columns. The matrix looks something like this:
1 2 3
4 5 6
7 8 9
3 4 5
I want to add all the numbers in the columns of this matrix and display the result to my stdout. This means that the numbers in the first column are:
... (2 Replies)
Hi,
Just trying to get to grips with sed and awk for some reporting for work and I need some assistance:
I have a file that lists policy names on the first line and then on the second line whether the policy is active or not.
Policy Name: Policy1
Active: yes
Policy... (8 Replies)
version info :
vi availabe with RHEL 5.4
I have a text file with 10,000 lines. I want to copy lines from 5000th line to 7000th and redirect to a file. Any idea how I can do this?
Note:
The above scenario is just an example. In my actual requirement, the file has 14 million lines and I want... (9 Replies)
Hi everyone,
I know the following questions are noobish questions but I am asking them because I am confused about the basics of history behind UNIX and LINUX.
Ok onto business, my questions are-:
Was/Is UNIX ever an open source operating system ?
If UNIX was... (21 Replies)
Hello,
I couldn't find an actual introduction thread, so I decided to just put this here.
I go by d0wngrade online. I have been programming in multiple languages for about 15+ years. I started with standard web design languages like HTML and CSS, but I then advanced from design to development... (2 Replies)
Hi guys...
The first active code line in AudioScope.sh is set -u .
This causes a complete exit if a variable is used/found but has not been allocated at the start of the program.
However, apart from writing code to do the task, is there a switch to to check which variables have been... (17 Replies)
Hi.
In thread https://www.unix.com/shell-programming-and-scripting/267833-grouping-counting.html rovf and I had a mini-discussion on grep and awk.
Here is a demo script that compares the awk and grep approaches for this single problem:
#!/usr/bin/env bash
# @(#) s2 Demonstrate group... (1 Reply)
Hello,
I have to fish out some specific columns from a file based on the header value. I have the list of columns I need in a different file. I thought I could read in the list of headers I need,
# file with header names of required columns in required order
headers_file=$2
# read contents... (11 Replies)
For those interested in installing dash shell on OSX Lion to help test POSIX compliancy of shell scripts, it is quite easy. I did it like this:
If you don't have gcc on your system:
0. Download and install the Command Line Tools for Xcode package from Sign In - Apple *
1. Download the dash... (2 Replies)
Hello and thanks in advance for any help anyone can offer me
I'm trying to learn the find command and thought I was understanding it... Apparently I was wrong. I was doing compound searches and I started getting weird results with the -size test. I was trying to do a search on a 1G file owned by... (14 Replies)
I have data of an excel files as given below,
file1
org1_1 1 1 2.5 100
org1_2 1 2 5.5 98
org1_3 1 3 7.2 88
file2
org2_1 1 1 2.5 100
org2_2 1 2 5.5 56
org2_3 1 3 7.2 70
I have multiple excel files as above shown.
I have to copy column 1, column 4 and paste into a new excel file as... (26 Replies)
Dear All,
Taking a break from Vue.js coding for the site, SEO and YT videos; and hopefully addressing some well deserved criticism from some here that I have been too focused on the visual aspects of the forums versus the substance and the community....
While the "current generation... (9 Replies)
Hi all...
Well guys and gals, I jumped in at the deep end and found things that PERL cannot do by default.
Many tricky terminal escape codes are not catered for so I had to create workarounds.
One thing I searched for was this:
Passing perl variable to shell command
AND, @Neo this was... (15 Replies)