Sponsored Content
Top Forums Shell Programming and Scripting Identifying single words in a dictionary database Post 302981159 by gimley on Wednesday 7th of September 2016 09:29:15 PM
Old 09-07-2016
I have Awk/Gawk and Perl.

May be I did not explain myself clearly. What I need is to remove all glosses which have two or more words and retain only the single words.

This implies a two stage operation. In Stage 1 at present I use a regex to identify such unique words within each string and store the string in a separate file. But then it can so happen that within the string there could also be glosses containing more than one word.
Code:
अकर्त्तव्य a That which is not proper to be done; improper.

In stage 2 I write a second regex to identify the gloss delimited by
Code:
, ; .

resulting in
Code:
अकर्त्तव्य a improper.

and which contains more than one word.
It works but the two stage operation is long and tedious and I was wondering if an Awk or Perl script could do the trick.
Thanks a lot
 

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

dictionary words in vim

how can i get the dictionary words in vim using keyboard keys? and how can i get the current directory filename? (1 Reply)
Discussion started by: lakshmananindia
1 Replies

2. Shell Programming and Scripting

Grep all words with "con" in them from a dictionary?

Is it possible to grep all words with the string "con" "Con" "CON" etc. etc. from a dictionary? for instance "magic command 'con' dictionary" will spit out words such as Confluence, contended, inconceivable etc etc. I really need this! Thank you! (14 Replies)
Discussion started by: guptaxpn
14 Replies

3. Shell Programming and Scripting

grep multiple words in a single line

Hi.. How to search for multiple words in a single line using grep?. Eg: Jack and Jill went up the hill Jack and Jill were best friends Humpty and Dumpty were good friends too ---------- I want to extract the 2nd statement(assuming there are several statements with... (11 Replies)
Discussion started by: anduzzi
11 Replies

4. UNIX for Dummies Questions & Answers

extract text between two words on a single line

Hi Guys, Can someone help me with a way to extract text between two words on a single line. For example if the file has below content I want to extract all text between b and f inclusive of b and f. Aparently sed does this but does it line by line and I guess it cannot read word by word. ... (11 Replies)
Discussion started by: krishnaux
11 Replies

5. Shell Programming and Scripting

Counting all words that start with a capital letter in a string using python dictionary

Hi, I have written the following python snippet to store the capital letter starting words into a dictionary as key and no of its appearances as a value in this dictionary against the key. #!/usr/bin/env python import sys import re hash = {} # initialize an empty dictinonary for line in... (1 Reply)
Discussion started by: royalibrahim
1 Replies

6. Shell Programming and Scripting

Identifying dupes within a database and creating unique sub-sets

Hello, I have a database of name variants with the following structure: variant=variant=variant The number of variants can be as many as thirty to forty. Since the database is quite large (at present around 60,000 lines) duplicate sets of variants creep in. Thus John=Johann=Jon and... (2 Replies)
Discussion started by: gimley
2 Replies

7. UNIX for Dummies Questions & Answers

Count words in a single column

Dear All, I have set of CSV files (comma separated) and each column have some information in them separated by space. Now I want to count them but have not been successful... Example data desired outcome I have tried few things including the link below. for C in $FILES do... (8 Replies)
Discussion started by: A-V
8 Replies

8. Shell Programming and Scripting

Reducing multiple entries in a tri-lingual dictionary to single entries

Dear all, I am editing a tri-lingual dictionary for open source which has the following data structure English headwords <Tab>Devanagari Headwords<Tab>PersoArabic headwords as in the example below to mark, to number अंगणु (اَنگَڻُ) The English headword entry has at times more than one word,... (2 Replies)
Discussion started by: gimley
2 Replies

9. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies
Locale::Codes::LangVar(3pm)				 Perl Programmers Reference Guide			       Locale::Codes::LangVar(3pm)

NAME
Locale::Codes::LangVar - standard codes for language variation identification SYNOPSIS
use Locale::Codes::LangVar; $lvar = code2langvar('acm'); # $lvar gets 'Mesopotamian Arabic' $code = langvar2code('Mesopotamian Arabic'); # $code gets 'acm' @codes = all_langvar_codes(); @names = all_langvar_names(); DESCRIPTION
The "Locale::Codes::LangVar" module provides access to standard codes used for identifying language variations, such as those as defined in the IANA language registry. Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language registry codes will be used. SUPPORTED CODE SETS
There are several different code sets you can use for identifying language variations. A code set may be specified using either a name, or a constant that is automatically exported by this module. For example, the two are equivalent: $lvar = code2langvar('en','alpha-2'); $lvar = code2langvar('en',LOCALE_CODE_ALPHA_2); The codesets currently supported are: alpha This is the set of alphanumeric codes from the IANA language registry, such as 'arevela' for Eastern Armenian. This code set is identified with the symbol "LOCALE_LANGVAR_ALPHA". This is the default code set. ROUTINES
code2langvar ( CODE [,CODESET] ) langvar2code ( NAME [,CODESET] ) langvar_code2code ( CODE ,CODESET ,CODESET2 ) all_langvar_codes ( [CODESET] ) all_langvar_names ( [CODESET] ) Locale::Codes::LangVar::rename_langvar ( CODE ,NEW_NAME [,CODESET] ) Locale::Codes::LangVar::add_langvar ( CODE ,NAME [,CODESET] ) Locale::Codes::LangVar::delete_langvar ( CODE [,CODESET] ) Locale::Codes::LangVar::add_langvar_alias ( NAME ,NEW_NAME ) Locale::Codes::LangVar::delete_langvar_alias ( NAME ) Locale::Codes::LangVar::rename_langvar_code ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangVar::add_langvar_code_alias ( CODE ,NEW_CODE [,CODESET] ) Locale::Codes::LangVar::delete_langvar_code_alias ( CODE [,CODESET] ) These routines are all documented in the Locale::Codes::API man page. SEE ALSO
Locale::Codes The Locale-Codes distribution. Locale::Codes::API The list of functions supported by this module. http://www.iana.org/assignments/language-subtag-registry The IANA language subtag registry. AUTHOR
See Locale::Codes for full author history. Currently maintained by Sullivan Beck (sbeck@cpan.org). COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.18.2 2014-01-06 Locale::Codes::LangVar(3pm)
All times are GMT -4. The time now is 02:52 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy