May be I did not explain myself clearly. What I need is to remove all glosses which have two or more words and retain only the single words.
This implies a two stage operation. In Stage 1 at present I use a regex to identify such unique words within each string and store the string in a separate file. But then it can so happen that within the string there could also be glosses containing more than one word.
Code:
अकर्त्तव्य a That which is not proper to be done; improper.
In stage 2 I write a second regex to identify the gloss delimited by
Code:
, ; .
resulting in
Code:
अकर्त्तव्य a improper.
and which contains more than one word.
It works but the two stage operation is long and tedious and I was wondering if an Awk or Perl script could do the trick.
Thanks a lot
Is it possible to grep all words with the string "con" "Con" "CON" etc. etc. from a dictionary?
for instance "magic command 'con' dictionary" will spit out words such as Confluence, contended, inconceivable etc etc.
I really need this! Thank you! (14 Replies)
Hi..
How to search for multiple words in a single line using grep?.
Eg: Jack and Jill went up the hill
Jack and Jill were best friends
Humpty and Dumpty were good friends too
----------
I want to extract the 2nd statement(assuming there are several statements with... (11 Replies)
Hi Guys,
Can someone help me with a way to extract text between two words on a single line.
For example if the file has below content I want to extract all text between b and f inclusive of b and f. Aparently sed does this but does it line by line and I guess it cannot read word by word.
... (11 Replies)
Hi,
I have written the following python snippet to store the capital letter starting words into a dictionary as key and no of its appearances as a value in this dictionary against the key.
#!/usr/bin/env python
import sys
import re
hash = {} # initialize an empty dictinonary
for line in... (1 Reply)
Hello,
I have a database of name variants with the following structure:
variant=variant=variant
The number of variants can be as many as thirty to forty.
Since the database is quite large (at present around 60,000 lines) duplicate sets of variants creep in. Thus
John=Johann=Jon
and... (2 Replies)
Dear All,
I have set of CSV files (comma separated) and each column have some information in them separated by space. Now I want to count them but have not been successful...
Example data
desired outcome
I have tried few things including the link below.
for C in $FILES
do... (8 Replies)
Dear all,
I am editing a tri-lingual dictionary for open source which has the following data structure
English headwords <Tab>Devanagari Headwords<Tab>PersoArabic headwords
as in the example below
to mark, to number अंगणु (اَنگَڻُ)
The English headword entry has at times more than one word,... (2 Replies)
Hello,
I have a dictionary which I am building for the Open Source Community. The data structure is as under
HEADWORD=PARTOFSPEECH=ENGLISH MEANING
as shown in the example below
अ=m=Prefix signifying negation.
अँहँ=ind=Interjection expressing disapprobation.
अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies
LEARN ABOUT SUNOS
wnnatod
wnnatod(1) User Commands wnnatod(1)NAME
wnnatod - Convert an EUC text dictionary to a binary dictionary
SYNOPSIS
/usr/bin/wnnatod [-s num] [-R] [-S] [-U] [-r] [-N] [-n] [-P filename] [-p filename] [-I] [-e] [-h filename] binary_dictionary_filename
DESCRIPTION
wnnatod reads a Japanese EUC text dictionary from the standard input, converts it to a binary dictionary and writes it to the specified
binary_dictionary_filename.
OPTIONS
The following options are available.
-s num Specifies the amount of memory to allocate (in words). num should be a little over the number of words in the dictionary.
Normally you do not need to specify this option. The default is 70,000. If wnnatod fails, notifying memory shortage, retry
the command with -s option.
-R Converts the EUC text dictionary to a reverse-searchable binary dictionary (default).
-S Converts the EUC text dictionary to a fixed-format dictionary.
-U Converts the EUC text dictionary to an editable dictionary.
-r Reverses the order of Kana and Kanji when converting the EUC text dictionary.
-N Sets the dictionary password to "*".
-n Sets the frequency password to "*".
-P filename Specifies the file name of the dictionary password.
-p filename Specifies the file name of the frequency password.
-I Creates a system dictionary.
-e Registers an entry's reading (Hiragana) as word in the binary dictionary if the reading and the word are the same (that is,
the word consists of only Hiragana). With this option, you cannot convert a text dictionary to a reverse-searchable
binary dictionary.
-h filename Specifies the file name that contains part of speech information.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
|Availability |SUNWjwncu |
+-----------------------------+-----------------------------+
SEE ALSO wnndictutil(1), wnndtoa(1), wnnotow(1), wnntouch(1)SunOS 5.10 2 Mar 1998 wnnatod(1)