CREATING A SYLLABLE CONCORDANCE WITH POSITIONAL VARIANTS
Hello,
Some time back I had posted a request for a syllable concordance in which if a syllable was provided in a file, the program would extract a word from a file entitled "Corpus" matching that syllable. The program was
The following script was provided which did the job and for which I am really thankful:
However I need one more refinement
I need to modify the program such that it finds the syllable in three different environents Initial medial Final Standalone(whole word)
example (theoretical: I know somebody will say "a" here is not a syllable. But I am working with Indian languages).
Syllable "a"
Intial Medial Final Standalone
ago bare gonna a
It could be that the syllable may not appear in all environments as in the case of stri
Intial Medial Final Standalone
strip Astrid NONE NONE
I have tried to factor in the environmental constraints using regexes but the results are disastrous
Please help. I have spent quite a few hours and the results get more ludicrous each time.
Many thanks and my gratitutde to the generous people on the forum who give their time and energy to helping out tyros like me.
Last edited by radoulov; 07-30-2011 at 04:07 AM..
Reason: Code tags!
I used to say \< and \> for word boundary, but the PERL guys got to the POSIX and changed it after decades, so both may be \b!
So, you need to check for
standalone \<a\>
initial \<a[a-z]
final [a-z]a\>
medial [a-z]a[a-z]
but since the [a-z] check is more expensive, you might be able to check in this order, since if not \<a\> then \<a is initial and a\> is final, and medial is none of the above.
Hello,
With a little help from colleagues, I finally managed to get the concordance going. Here is the code in case someone else would like to use it:
Many thanks for the information re. Regex.
Last edited by Scott; 08-02-2011 at 01:14 AM..
Reason: Code tags, please...
A logic tree and removing redundant tests save time. If it has a prefix char, it is medial or final else it is initial or standalone, and for prefix'ed, if not medial it is always final, no test needed.
Dear community, I am facing a problem and I kindly ask your help:
I have 4 different data sets consisted from 3 different types of array.
On each file, column 1 is chromosome position, column 2 is SNP id etc... Lets say I have the following (bim) datasets:
x2014:
1 rs3094315... (4 Replies)
Hi, I have a big settings confg (file attached). There are a few separate tasks that I have to accomplish. All scripting/programming languages are appreciated.
1. I need to parse all values and output to stdout. Sample output (truncated):
VALUEA
2017-01-01
Lores ipsum
Lorem ipsum dolor sit... (11 Replies)
I am working on a database of a language using Arabic Script. One of the major issues is that the shape of the characters changes according to their initial, medial or final positioning. Another major issue is that of the clustering of vowels within the word: the clustering changes totally the... (9 Replies)
How to remove first pair of parentheses and content in them from the beginning of the line?
Here's the list:
(ok)-test
(ok)-test-(ing)
(some)-test-(ing)-test
test-(ing)
Desired result:
test
test-(ing)
test-(ing)-test
test-(ing)
Here's what I already tried with GNU sed:
sed -e... (6 Replies)
Hello,
I have two files. The first file contains specific syllables of a language (Hindi) and the second file contains a large database from which these syllables have been culled.
The syllable file which has syllables in Hindi has one syllable per line
and the corpus file has a data... (8 Replies)