Sponsored Content
Top Forums Shell Programming and Scripting CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS Post 302543658 by DGPickett on Monday 1st of August 2011 03:47:11 PM
Old 08-01-2011
Well, regex for white space vary: Regex Tutorial - \b Word Boundaries

I used to say \< and \> for word boundary, but the PERL guys got to the POSIX and changed it after decades, so both may be \b!

So, you need to check for
  • standalone \<a\>
  • initial \<a[a-z]
  • final [a-z]a\>
  • medial [a-z]a[a-z]
but since the [a-z] check is more expensive, you might be able to check in this order, since if not \<a\> then \<a is initial and a\> is final, and medial is none of the above.
 

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Creating a syllable concordance

Hello, I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled. The syllable file which has syllables in Hindi has one syllable per line and the corpus file has a data... (8 Replies)
Discussion started by: gimley
8 Replies

2. Shell Programming and Scripting

[All variants] remove first pair of parentheses

How to remove first pair of parentheses and content in them from the beginning of the line? Here's the list: (ok)-test (ok)-test-(ing) (some)-test-(ing)-test test-(ing) Desired result: test test-(ing) test-(ing)-test test-(ing) Here's what I already tried with GNU sed: sed -e... (6 Replies)
Discussion started by: useretail
6 Replies

3. Shell Programming and Scripting

Writing a clustering concordance for a Perso-Arabic script

I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the... (9 Replies)
Discussion started by: gimley
9 Replies

4. Shell Programming and Scripting

[All variants] Change settings

Hi, I have a big settings confg (file attached). There are a few separate tasks that I have to accomplish. All scripting/programming languages are appreciated. 1. I need to parse all values and output to stdout. Sample output (truncated): VALUEA 2017-01-01 Lores ipsum Lorem ipsum dolor sit... (11 Replies)
Discussion started by: useretail
11 Replies

5. UNIX for Beginners Questions & Answers

Merge 4 bim files by keeping only the overlapping variants (unique rs values )

Dear community, I am facing a problem and I kindly ask your help: I have 4 different data sets consisted from 3 different types of array. On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets: x2014: 1 rs3094315... (4 Replies)
Discussion started by: fondan
4 Replies
WORD-LIST-COMPRESS(1)					 Aspell Abbreviated User's Manual				     WORD-LIST-COMPRESS(1)

NAME
word-list-compress - word list compressor/decompressor for GNU Aspell SYNOPSIS
word-list-compress c[ompress] | d[ecompress] DESCRIPTION
word-list-compress compresses or decompresses sorted word lists for use with the GNU Aspell spell checker. COMMANDS
-c, c, compress compress the plain text word list read from standard input. -d, d, decompress decompress the compressed word list read from standard input. EXAMPLES
Here are a few examples of how you can use word-list-compress word-list-compress d <wordlist.cwl >wordlist.txt Decompress file wordlist.cwl to text file wordlist.txt word-list-compress c <wordlist.wl >wordlist.cwl 2>errors.txt Compress wordlist.wl to wordlist.cwl and send any error messages to a text file named errors.txt LC_COLLATE=C sort -u <wordlist.txt | word-list-compress c >wordlist.cwl Sort a word list, then pipe it to word-list-compress to create a compressed binary wordlist.cwl file. word-list-compress d <words.cwl | aspell create master ./words.rws Decompress a wordlist, then pipe it to aspell(1) to create a spelling list. Please check the aspell(1) info manual for proper usage and options. TIPS
Word-list-compress is best used with sorted word list type files. It is not a general purpose compression program since the resulting files may actually increase in size. Word-list-compress accepts up to 255 text characters in the range of {0x21...0xFF}. If your word list requires a larger character set for certain languages or longer length for multi-word, scientific, medical, technical or other use, then it is recommended that you compress your word list using prezip-bin(1) DIAGNOSTICS
Word-list-compress normally exits with a return code of 0. If it encounters an error, a message is sent to standard error output (stderr), and word-list-compress exits with a non-zero return value. Error messages are listed below: (display help/usage message) Unknown command given on the command line so word-list-compress displays a usage message to standard error output. Corrupt Input This is only for the decompression command d. The input file is of an unknown format or the input file/stream is corrupted. You may have some valid output, but word-list-compress could not complete the process. If the input file is a compressed wordlist but you have no output file, then it may be a newer prezip-bin(1) version of compressed file, if so, try decompressing the file with prezip-bin(1) instead. Output Data Error The output is full, write protected, or has an error and can no longer be written to. SEE ALSO
aspell(1), aspell-import(1), prezip-bin(1), run-with-aspell(1) Aspell is fully documented in its Texinfo manual. See the `aspell' entry in info for more complete documentation. REPORTING BUGS
For help, see the Aspell homepage at <http://aspell.net> and send bug reports/comments to the Aspell user list at the above address. AUTHOR
This manual page was written by Aaron Lehmann <aaronl@vitelus.com>, Brian Nelson <pyro@debian.org> and Jose Da Silva <digital@joescat.com>. GNU
2005-09-05 WORD-LIST-COMPRESS(1)
All times are GMT -4. The time now is 07:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy