Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

slmseg(1) [debian man page]

SLMSEG(1)						User Contributed Perl Documentation						 SLMSEG(1)

NAME
slmseg - maximum matching segment Chinese text. SYNOPSIS
slmseg -d dict_file [option]... [corpus_file]... DESCRIPTION
slmseg is a tool for segmenting Chinese text into words using maximum matching algorithm. slmseg segments corpus_file, or standard input if no filename is specified, and write the segmented result to standard output. OPTIONS
-d dict_file Use dict_file as lexicon. A default lexicon can be found at /usr/share/sunpinyin-slm/dict.utf8. -f,--format (text|bin) Output Format, can be 'text' or 'bin'. default 'bin'. Normally, in text mode, word text are output, while in binary mode, binary short integer of the word-ids are written to stdout. -s, --stok STOK_ID Sentence token id. Default 10. It will be written to output in binary mode after every sentence. -i, --show-id Show Id info. Under text output format mode, attach id after known words. If under binary mode, print id(s) in text. -m, --model language-model-file Speficy the language model file. This file is always generated by slmthread. NOTES
Under binary mode, consecutive id of 0 are merged into one 0. Under text mode, no space are inserted between unknown-words. AUTHOR
Originally written by Phill.Zhang <phill.zhang@sun.com>. Currently maintained by Kov.Chai <tchaikov@gmail.com>. SEE ALSO
mmseg(1), ids2ngram (1). perl v5.14.2 2012-06-09 SLMSEG(1)

Check Out this Related Man Page

CDTOA(1)						      General Commands Manual							  CDTOA(1)

NAME
cdtoa - To convert the binary format of a dictionary back to text format. SYNOPSIS
cdtoa [-n] [-s] [-z] [-e] [-E] infilename [-h cixingfile ] [ usagefreqfile ] DEFAULT PATH
/usr/local/bin/cWnn4/cdtoa DESCRIPTION
To convert the binary format of the dictionary to text format, and output to standard output(stdout). infilename is the name of the input binary format dictionary. The output may be piped into a file by using the ">" command. For example, cdtoa dict.dic > dict.u "dict.u" here is the output text format dictionary, while the "dict.dic" is the input binary format dictionary. usagefreqfile may indicate more than one user usage frequency files (for a particular user). These usage frequency information will be reflected in the text format dictionary created. OPTIONS
-s To order the entries in text dictionary according to Pinyin or Zhuyin. -n To attach sequence numbers to the output. -z To convert the binary format back to text format in Zhuyin. (Note: default is Pinyin) -e If the Hanzi inside the text dictionary contains characters such as space and tab, they will be compacted to special format. (Default) -E If the Hanzi inside the text dictionary contains characters such as space and tab, they will NOT be compacted to special format. -h cixingfile To specify the Cixing definition file. NOTE
1. The parts in [ ] are options. They may be omitted. 2. The Pinyin and Zhuyin dictionary has the same format. 3. The default conversion result of the text dictionary is in Pinyin. 13 May 1992 CDTOA(1)
Man Page

15 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl and binary workings

Perhaps it's me - maybe I'm dumb and am getting this working solution wrong... I have a binary value 00000011 in $binVal and I want to print the result in denary, so in perl I did as the perldoc -f oct told me to do and added a 0b prefix as so: $binVal = "0b" . $binVal; then I wanted to... (5 Replies)
Discussion started by: WIntellect
5 Replies

2. Programming

Binary to text format conversion

Hi, Please can any one tell me how to convert binary data to text format and vice versa. If possible give me the algorithm or C program. Thanks in advance Waiting for reply Bye:o (5 Replies)
Discussion started by: manjunath
5 Replies

3. UNIX for Dummies Questions & Answers

How to convert binary Unix file to text

Hi all, I have a print control file (dflt) for Oracle which is in binary. As I am going to develope an application in Window environment, I would like to reference the dflt file. But it is in binary format and I cannot access it. Anyone can suggest me how to convert the file into text or... (5 Replies)
Discussion started by: user12345
5 Replies

4. Programming

Hi!

what is a magic no? what is the difference between a binary file and a acsii text file? what is the difference between ascii text and a english text file? Appreciate any kind of help. Thanks for ur replies in advance. (2 Replies)
Discussion started by: vijlak
2 Replies

5. Shell Programming and Scripting

Removing text from a line in a file

Hi All, I would like to know how to remove text from a line in a file. eg to The 4 sets of numbers are not static ie they change on each line in each different file so if anyone can help that would be great. Jeremy (10 Replies)
Discussion started by: outthere_3
10 Replies

6. UNIX for Dummies Questions & Answers

copy loses the text format

Hi I try to copy part of text from one file to another file. My problem is the text in the new file loses all the format. My code is: #!/bin/sh while red line do if then echo "$line" >> ./new_file else break fi done < "./old_file" Is there a way to modify... (3 Replies)
Discussion started by: tiger99
3 Replies

7. Shell Programming and Scripting

Display most top 10 occurring words along with number of ocurences of word inthe text

I need Display the most top 10 occurring words along with the number of occurences of those words in the given text. Sample text as below: "The Travails of Single South Indian men of conservative upbringing" or "Why we don't get any..." Yet another action packed weekend in Mumbai, full of... (2 Replies)
Discussion started by: smacherla
2 Replies

8. UNIX for Dummies Questions & Answers

Blank out words

Hi there, folks! back again for a little help from my friends... My job right now is to get all the words between , in a sentence, deleted and written to a new file, and the code I use is this: awk -F"" '{print $2 > "words"}{$2=""}1' inputfile > outputfile But now I find that is does... (4 Replies)
Discussion started by: eldeingles
4 Replies

9. Shell Programming and Scripting

Split the file based on the content

Arun kumar something somehting Enterting in to the line . . . . Some text text Finshing the sentence Some other text . . . . Again something somehting Enterting in to the line . . . . . . Again text text Finshing the sentence (6 Replies)
Discussion started by: arukuku
6 Replies

10. Shell Programming and Scripting

Split a free form text delimited by space to words with other fields

Hi, I need your help for below with shell scripting or perl I/P key, Sentence customer1, I am David customer2, I am Taylor O/P Key, Words Customer1,I Customer1,am Customer1,David Customer2,I Customer2,am Customer2,Taylor (4 Replies)
Discussion started by: monishathampi
4 Replies

11. Shell Programming and Scripting

Find all matching words in text according to pattern

Hello dear Unix shell professionals, I am desperately trying to get a seemingly simple logic to work. I need to extract words from a text line and save them in an array. The text can look anything like that: aaaaaaa${important}xxxxxxxx${important2}ooooooo${importantstring3}...I am handicapped... (5 Replies)
Discussion started by: Grünspanix
5 Replies

12. Shell Programming and Scripting

PERL: matching text between 2 values

Hi, I am trying to get a text value between 2 words in a string and assign it to a value. Basically the program should read each row in a file and return the text between 2 fields and print it to another file. My code: #!/usr/bin/perl open FAILED, "./AFile.txt"; while(<FAILED>) { ... (3 Replies)
Discussion started by: chris01010
3 Replies

13. Shell Programming and Scripting

Frequent words and trigraphs in text

Hello all, how to get the most 30 frequent words in text and the most frequent trigraphs (three character in same order in text )? note that : the text is none English text (Arabic text) so I will get the result as top 30 words abdbdns asddd wqwfqw top 30 trigraphs abc... (3 Replies)
Discussion started by: khaled79
3 Replies

14. Programming

[awk]Chinese words!!

Is there a way to extract chinese words from a text written in an European Language? I want to create a glossary and finding a way would make me save time! Thank you! (3 Replies)
Discussion started by: ettore8888
3 Replies

15. UNIX for Dummies Questions & Answers

Converting binary file to text file

Hi, Im wondering how I can convert a binary file to a text file? I have ran the following command to output which type of binary file coding it is od -t x1 -c eHat.data0 | head -20 and that gives me the following output; 0000000 5c 00 00 00 cd 06 f2 41 00 00 00 c6 00 00 00 c6 \ \0 \0 \0... (3 Replies)
Discussion started by: dp0b
3 Replies