SLMSEG(1) User Contributed Perl Documentation SLMSEG(1)NAME
slmseg - maximum matching segment Chinese text.
SYNOPSIS
slmseg -d dict_file [option]... [corpus_file]...
DESCRIPTION
slmseg is a tool for segmenting Chinese text into words using maximum matching algorithm. slmseg segments corpus_file, or standard input if
no filename is specified, and write the segmented result to standard output.
OPTIONS -d dict_file
Use dict_file as lexicon. A default lexicon can be found at /usr/share/sunpinyin-slm/dict.utf8.
-f,--format (text|bin)
Output Format, can be 'text' or 'bin'. default 'bin'. Normally, in text mode, word text are output, while in binary mode, binary short
integer of the word-ids are written to stdout.
-s, --stok STOK_ID
Sentence token id. Default 10. It will be written to output in binary mode after every sentence.
-i, --show-id
Show Id info. Under text output format mode, attach id after known words. If under binary mode, print id(s) in text.
-m, --model language-model-file Speficy the language model file. This file is always generated by slmthread.
NOTES
Under binary mode, consecutive id of 0 are merged into one 0. Under text mode, no space are inserted between unknown-words.
AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>.
SEE ALSO mmseg(1), ids2ngram (1).
perl v5.14.2 2012-06-09 SLMSEG(1)
Check Out this Related Man Page
CDTOA(1) General Commands Manual CDTOA(1)NAME
cdtoa - To convert the binary format of a dictionary back to
text format.
SYNOPSIS
cdtoa [-n] [-s] [-z] [-e] [-E] infilename
[-h cixingfile ] [ usagefreqfile ]
DEFAULT PATH
/usr/local/bin/cWnn4/cdtoa
DESCRIPTION
To convert the binary format of the dictionary to text
format, and output to standard output(stdout).
infilename is the name of the input binary format
dictionary.
The output may be piped into a file by using the ">"
command. For example,
cdtoa dict.dic > dict.u
"dict.u" here is the output text format dictionary, while the "dict.dic" is the input binary format dictionary.
usagefreqfile may indicate more than one user usage frequency files (for a particular user). These usage frequency information will be
reflected in the text format dictionary created.
OPTIONS -s To order the entries in text dictionary according to Pinyin or Zhuyin.
-n To attach sequence numbers to the output.
-z To convert the binary format back to text format in Zhuyin.
(Note: default is Pinyin)
-e If the Hanzi inside the text dictionary contains characters such as space and tab, they will be compacted to special format.
(Default)
-E If the Hanzi inside the text dictionary contains characters such as space and tab, they will NOT be compacted to special format.
-h cixingfile
To specify the Cixing definition file.
NOTE
1. The parts in [ ] are options. They may be omitted.
2. The Pinyin and Zhuyin dictionary has the same format.
3. The default conversion result of the text dictionary is in Pinyin.
13 May 1992 CDTOA(1)
Perhaps it's me - maybe I'm dumb and am getting this working solution wrong...
I have a binary value 00000011 in $binVal and I want to print the result in denary, so in perl I did as the perldoc -f oct told me to do and added a 0b prefix as so:
$binVal = "0b" . $binVal;
then I wanted to... (5 Replies)
Hi,
Please can any one tell me how to convert binary data to text format and vice versa.
If possible give me the algorithm or C program.
Thanks in advance
Waiting for reply
Bye:o (5 Replies)
Hi all,
I have a print control file (dflt) for Oracle which is in binary. As I am going to develope an application in Window environment, I would like to reference the dflt file. But it is in binary format and I cannot access it. Anyone can suggest me how to convert the file into text or... (5 Replies)
what is a magic no?
what is the difference between a binary file and a acsii text file?
what is the difference between ascii text and a english text file?
Appreciate any kind of help.
Thanks for ur replies in advance. (2 Replies)
Hi All,
I would like to know how to remove text from a line in a file.
eg
to
The 4 sets of numbers are not static ie they change on each line in each different file so if anyone can help that would be great.
Jeremy (10 Replies)
Hi
I try to copy part of text from one file to another file. My problem is the text in the new file loses all the format.
My code is:
#!/bin/sh
while red line
do
if
then
echo "$line" >> ./new_file
else
break
fi
done < "./old_file"
Is there a way to modify... (3 Replies)
I need Display the most top 10 occurring words along with the number of occurences of those words in the given text.
Sample text as below:
"The Travails of Single South Indian men of conservative upbringing" or "Why we don't get any..."
Yet another action packed weekend in Mumbai, full of... (2 Replies)
Hi there, folks!
back again for a little help from my friends...
My job right now is to get all the words between , in a sentence, deleted and written to a new file, and the code I use is this:
awk -F"" '{print $2 > "words"}{$2=""}1' inputfile > outputfile
But now I find that is does... (4 Replies)
Arun kumar something somehting Enterting in to the line
.
.
.
.
Some text text Finshing the sentence
Some other text
.
.
.
.
Again something somehting Enterting in to the line
.
.
.
.
.
.
Again text text Finshing the sentence (6 Replies)
Hi,
I need your help for below with shell scripting or perl
I/P
key, Sentence
customer1, I am David
customer2, I am Taylor
O/P
Key, Words
Customer1,I
Customer1,am
Customer1,David
Customer2,I
Customer2,am
Customer2,Taylor (4 Replies)
Hello dear Unix shell professionals,
I am desperately trying to get a seemingly simple logic to work. I need to extract words from a text line and save them in an array. The text can look anything like that:
aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}...I am handicapped... (5 Replies)
Hi,
I am trying to get a text value between 2 words in a string and assign it to a value. Basically the program should read each row in a file and return the text between 2 fields and print it to another file.
My code:
#!/usr/bin/perl
open FAILED, "./AFile.txt";
while(<FAILED>)
{
... (3 Replies)
Hello all,
how to get the most 30 frequent words in text and the most frequent trigraphs (three character in same order in text )?
note that : the text is none English text (Arabic text)
so I will get the result as
top 30 words
abdbdns
asddd
wqwfqw
top 30 trigraphs
abc... (3 Replies)
Is there a way to extract chinese words from a text written in an European Language? I want to create a glossary and finding a way would make me save time!
Thank you! (3 Replies)
Hi,
Im wondering how I can convert a binary file to a text file?
I have ran the following command to output which type of binary file coding it is
od -t x1 -c eHat.data0 | head -20
and that gives me the following output;
0000000 5c 00 00 00 cd 06 f2 41 00 00 00 c6 00 00 00 c6 \ \0 \0 \0... (3 Replies)