02-23-2011
Splitting Concatenated Words in Input File with Words from a Master File
Hello,
I have a complex problem. I have a file in which words have been joined together:
Theboy ranslowly
I want to be able to correctly split the words using a lookup file in which all the words occur:
the
boy
ran
slowly
slow
put
child
ly
The lookup file which is meant for look up for splitting the words is huge and serves as a look up to correctly segment the input file which has “runon” words. The input file could also be very large.
It could also contain upto three to four words concatenated together.
I have 2 requirements:
1. Only the largest string should be used for splitting. Thus given that both slow and ly occur, I do not want the split to be :
the boy ran slow ly
But rather
the boy ran slowly.
2. In case a word is not found in the master list, all other largest strings should be spewed out
E.g. Assume that boy is not in the lookup file, I would still want the cut to be:
The boy ran slowly
i.e.” boy” is flagged as residue and tagged as such if possible.
I have tried to write a program which does this (both in Perl as well as in AWK, but it just fails and spews out incorrect forms, especially when I try to meet condition 1.
I am still a tyro at PERL and AWK since all my experience has been in C for the past 20 years and I am fascinated by AWK as well as PERL because of their speed and elegance.
Help would be most appreciated and gratefully acknowledged to help me learn a new skill. A commented code would be a great learning experience, if someone could have the patience to do that for me as well as for others like me who are learners,
Manythanks, (Many thanks)
GIMLEY
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi,
I have a string like this in a file,
I want to retrive the words separated by comma's in 3 variables. like
How do i get that.plz advice (2 Replies)
Discussion started by: suresh_kb211
2 Replies
2. Shell Programming and Scripting
hello,
i 'd like your help about a bash script which:
1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel,
2.finds the link which leads to the download location of the Latest Stable Kernel version,
(the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies
3. Shell Programming and Scripting
Hi,
I am trying to split the words having the delimiter as colon ';' in to separate files using awk.
Here's my code.
echo "f1;f2;f3" | awk '/;/{c=sprintf("%02d",++i); close("out" c)} {print > "out" c}'
echo "f1;f2;f3" | awk -v i=0 '/;/{close("out"i); i++; next} {print > "out"i}'
But... (4 Replies)
Discussion started by: royalibrahim
4 Replies
4. Shell Programming and Scripting
hello,
I had posted earlier help for a script for splitting concatenated words . The script was supposed to read words from a master file and split concatenated words in the slave/input file.
Thanks to the help I got, the following script which works very well was posted. It detects residues by... (14 Replies)
Discussion started by: gimley
14 Replies
5. Shell Programming and Scripting
Dear all,
I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list.
An example would make this clear
annamarie
mariechristine
johnsmith
johnjoseph smith
john
smith... (8 Replies)
Discussion started by: gimley
8 Replies
6. Shell Programming and Scripting
Hello,
I am sorry if the title is confusing, but I need a script to grep a list of Names from a Source file in a Master database in which all the homophonic variants of the name are listed along with a single indexing key and store all of these in an output file. I need this because I am testing... (4 Replies)
Discussion started by: gimley
4 Replies
7. Shell Programming and Scripting
Hi ,
I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies
8. Shell Programming and Scripting
Hi
I have strings like these :
Vengeance mitt
Men Vengeance gloves
Women Quatro Windstopper Etip gloves
Quatro Windstopper Etip gloves
Girls Thermobite hooded jacket
Thermobite Triclimate snow jacket
Boys Thermobite Triclimate snow jacket
and I would like to get the lower case words at... (2 Replies)
Discussion started by: louisJ
2 Replies
9. UNIX for Dummies Questions & Answers
Hello,
I would like to change my setting in a file to the setting that user input.
For example, by default it is
ONBOOT=ON
When user key in "YES", it would be
ONBOOT=YES
--------------
This code only adds in the entire user input, but didn't replace it.
How do i go about... (5 Replies)
Discussion started by: malfolozy
5 Replies
10. Shell Programming and Scripting
Hi All,
I need one help to replace particular words in file based on if finds another words in that file .
i.e.
my self is peter@king.
i am staying at north sydney.
we all are peter@king.
How to replace peter to sham if it finds @king in any line of that file.
Please help me... (8 Replies)
Discussion started by: Rajib Podder
8 Replies
LEARN ABOUT OPENDARWIN
mkpwdict
mkpwdict(1M) System Administration Commands mkpwdict(1M)
NAME
mkpwdict - maintain password-strength checking database
SYNOPSIS
/usr/sbin/mkpwdict [-s dict1,... ,dictN] [-d destination-path]
DESCRIPTION
The mkpwdict command adds words to the dictionary-lookup database used by pam_authtok_check(5) and passwd(1).
Files containing words to be added to the database can be specified on the command-line using the -s flag. These source files should have a
single word per line, much like /usr/share/lib/dict/words.
If -s is omitted, mkpwdict will use the value of DICTIONLIST specified in /etc/default/passwd (see passwd(1)).
The database is created in the directory specified by the -d option. If this option is omitted, mkpwdict uses the value of DICTIONDBDIR
specified in /etc/default/passwd (see passwd(1)). The default location is /var/passwd.
OPTIONS
The following options are supported:
-s
Specifies a comma-separated list of files containing words to be added to the dictionary-lookup database.
-d
Specifies the target location of the dictionary-database.
FILES
/etc/default/passwd
See passwd(1).
/var/passwd
default destination directory
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWcsu |
+-----------------------------+-----------------------------+
|Interface Stability |Evolving |
+-----------------------------+-----------------------------+
SEE ALSO
passwd(1), attributes(5), pam_authtok_check(5)
SunOS 5.10 1 Jun 2004 mkpwdict(1M)