Splitting concatenated words in input file with words from the same file
Dear all,
I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list.
An example would make this clear
The program should split the words in the list basing itself on the single forms which are there. Thus
In the case of the last since
is missing, the program could suitably tag the missing element and show the word as
The script would prove especially helpful in separating words in languages such as German whch have a large number of compounded words.
Could the script in awk posted on this site (thanks to yinyuemi) and which I am posting below (which does something similar but it takes words from an external dictionary), be modified to work within the same database instead of referring to an external dictionary. I have tried to modify it but it just does not work.
Any help given would be gratefully acknowledged.
Moderator's Comments:
Please use [code][/code] tags instead of [quote][/quote] tags for code and samples
Last edited by Scrutinizer; 05-03-2012 at 01:55 AM..
Reason: code tags instead of quote tags
Many thanks. I copied the script and ran it on the file which I had proposed as a sample. I got no results.
Have I done something wrong ? I am on Windows and maybe this is the cause; but awk/gawk should run on any environment.
This is tantalising to see a solution and not be able to use it.
Many thanks once more for your kind help.
Many thanks. You made my day. The script works. I should have thought of removing the infile infile and giving them at command prompt.
Many thanks once again for all your kind help and your patience.
---------- Post updated at 10:05 PM ---------- Previous update was at 07:42 PM ----------
Sorry to sound ungrateful. The script works. But my file is around 300 thousand words and the script is very slow.
Any means of speeding it up, an array or some such device. Many thanks for all your help and sorry to pester you like this.
Hi All,
I need one help to replace particular words in file based on if finds another words in that file .
i.e.
my self is peter@king.
i am staying at north sydney.
we all are peter@king.
How to replace peter to sham if it finds @king in any line of that file.
Please help me... (8 Replies)
hi,
i need to replace all words in any quote position and then need to change the words inside the file thousand of raw.
textfile data :
"Ninguno","Confirma","JuicioABC"
"JuicioCOMP","Recurso","JuicioABC"
"JuicioDELL","Nulidad","Nosino"
"Solidade","JuicioEUR","Segundo"
need... (1 Reply)
Hello,
I would like to change my setting in a file to the setting that user input.
For example, by default it is
ONBOOT=ON
When user key in "YES", it would be
ONBOOT=YES
--------------
This code only adds in the entire user input, but didn't replace it.
How do i go about... (5 Replies)
Hi
I have strings like these :
Vengeance mitt
Men Vengeance gloves
Women Quatro Windstopper Etip gloves
Quatro Windstopper Etip gloves
Girls Thermobite hooded jacket
Thermobite Triclimate snow jacket
Boys Thermobite Triclimate snow jacket
and I would like to get the lower case words at... (2 Replies)
Hi ,
I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
hello,
I had posted earlier help for a script for splitting concatenated words . The script was supposed to read words from a master file and split concatenated words in the slave/input file.
Thanks to the help I got, the following script which works very well was posted. It detects residues by... (14 Replies)
Hello,
I have a complex problem. I have a file in which words have been joined together:
Theboy ranslowly
I want to be able to correctly split the words using a lookup file in which all the words occur:
the
boy
ran
slowly
slow
put
child
ly
The lookup file which is meant for look up... (21 Replies)
Hi,
I am trying to split the words having the delimiter as colon ';' in to separate files using awk.
Here's my code.
echo "f1;f2;f3" | awk '/;/{c=sprintf("%02d",++i); close("out" c)} {print > "out" c}'
echo "f1;f2;f3" | awk -v i=0 '/;/{close("out"i); i++; next} {print > "out"i}'
But... (4 Replies)
hello,
i 'd like your help about a bash script which:
1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel,
2.finds the link which leads to the download location of the Latest Stable Kernel version,
(the right link should lead to the file... (3 Replies)
Hi,
I have a string like this in a file,
I want to retrive the words separated by comma's in 3 variables. like
How do i get that.plz advice (2 Replies)