Splitting Concatenated Words With Largest Strings First Post: 302511095

Sponsored Content

Top Forums Shell Programming and Scripting Splitting Concatenated Words With Largest Strings First Post 302511095 by gimley on Tuesday 5th of April 2011 09:09:41 PM

04-05-2011

Registered User

Splitting Concatenated Words With Largest Strings First

hello,
I had posted earlier help for a script for splitting concatenated words . The script was supposed to read words from a master file and split concatenated words in the slave/input file.
Thanks to the help I got, the following script which works very well was posted. It detects residues by placing a ! before the residual element.
However the script does not take the largest string for splitting which leads to problems.
An example will help:
given that the master file has

Code:

narayan 
narayana 
prakash
aprak
ash

In the case of narayanaprakash, I get:

Code:

narayan, aprak and ash

instead of

Code:

narayana prakash.

How do I get the script to produce the second instead of the first?

Many thanks for all the earlier help and hope this problem of largest string first can be resolved:

Code:

#Util to split names which are conjoined
NR==FNR{a[$1]; next}
function lsr(c,p) {
    for(p=length(c);p;p--)
        if(tolower(substr(c,1,p)) in a) break;
    if (p) return substr(c,1,p);
    return "";
}
{while(length) {
    s=lsr($0);
    if (!s) printf "!";
    while (!s && length) {
        printf substr($0,1,1);
        $0=substr($0,2);
        s=lsr($0);
        if (s) printf "! ";
    }
    printf "%s ", s;
    $0=substr($0,length(s)+1)
}
printf "\n"; }

Last edited by fpmurphy; 04-05-2011 at 10:56 PM.. Reason: Code tags please!

gimley

View Public Profile for gimley

Find all posts by gimley

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

splitting strings

Hi you, I have the following problem: I have a string like the followings: '166Mhz' or '128MB' or '300sec' or ... What I want to do is, I want to split the strings in a part with the numbers and a part with letters. Since the strings are not allway three digits and than text i couldn't do...

2. Programming

Splitting strings from file

Hi All I need help writing a Java program to split strings reading from a FILE and writing output into a FILE. e.g., My input is : International NNP Rockwell NNP Corp. NNP 's POS Tulsa NNP unit NN said VBDExpected output is: International I In Int Inte l al...

3. Shell Programming and Scripting

splitting words from a string

Hi, I have a string like this in a file, I want to retrive the words separated by comma's in 3 variables. like How do i get that.plz advice

4. Shell Programming and Scripting

Awk splitting words into files problem

Hi, I am trying to split the words having the delimiter as colon ';' in to separate files using awk. Here's my code. echo "f1;f2;f3" | awk '/;/{c=sprintf("%02d",++i); close("out" c)} {print > "out" c}' echo "f1;f2;f3" | awk -v i=0 '/;/{close("out"i); i++; next} {print > "out"i}' But...

5. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Hello, I have a complex problem. I have a file in which words have been joined together: Theboy ranslowly I want to be able to correctly split the words using a lookup file in which all the words occur: the boy ran slowly slow put child ly The lookup file which is meant for look up...

6. Shell Programming and Scripting

Splitting concatenated words in input file with words from the same file

Dear all, I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list. An example would make this clear annamarie mariechristine johnsmith johnjoseph smith john smith...

7. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1...

8. Shell Programming and Scripting

awk Splitting strings

Hi All, There is a file with a data. If the line is longer than 'n', we splitting the line on the parts and print them. Each of the parts is less than or equal 'n'. For example: n = 2; "ABCDEFGHIJK" -> length 11 Results: "AB" "CD" EF" GH" "IJ" "K" Code, but there are some errors....

9. UNIX for Dummies Questions & Answers

Splitting strings

I have a file that has two columns. I first column is an identifier and the second is a column of strings. I want to split the characters in the second column into substrings of length 5. So if the first line of the file has a string of length 10, the output should have the identifier repeated 2...

10. UNIX for Dummies Questions & Answers

Splitting strings based on delimiter

i have a snippet from server log delimited by forward slash. /a/b/c/d/filename i need to cut until last delimiter. So desired output should look like: /a/b/c/d can you please help? Thanks in advance.

LEARN ABOUT DEBIAN

bytes

bytes(3perl)						 Perl Programmers Reference Guide					      bytes(3perl)

NAME

       bytes - Perl pragma to force byte semantics rather than character semantics

NOTICE

       This pragma reflects early attempts to incorporate Unicode into perl and has since been superseded. It breaks encapsulation (i.e. it
       exposes the innards of how the perl executable currently happens to store a string), and use of this module for anything other than
       debugging purposes is strongly discouraged. If you feel that the functions here within might be useful for your application, this possibly
       indicates a mismatch between your mental model of Perl Unicode and the current reality. In that case, you may wish to read some of the perl
       Unicode documentation: perluniintro, perlunitut, perlunifaq and perlunicode.

SYNOPSIS

	   use bytes;
	   ... chr(...);       # or bytes::chr
	   ... index(...);     # or bytes::index
	   ... length(...);    # or bytes::length
	   ... ord(...);       # or bytes::ord
	   ... rindex(...);    # or bytes::rindex
	   ... substr(...);    # or bytes::substr
	   no bytes;

DESCRIPTION

       The "use bytes" pragma disables character semantics for the rest of the lexical scope in which it appears.  "no bytes" can be used to
       reverse the effect of "use bytes" within the current lexical scope.

       Perl normally assumes character semantics in the presence of character data (i.e. data that has come from a source that has been marked as
       being of a particular character encoding). When "use bytes" is in effect, the encoding is temporarily ignored, and each string is treated
       as a series of bytes.

       As an example, when Perl sees "$x = chr(400)", it encodes the character in UTF-8 and stores it in $x. Then it is marked as character data,
       so, for instance, "length $x" returns 1. However, in the scope of the "bytes" pragma, $x is treated as a series of bytes - the bytes that
       make up the UTF8 encoding - and "length $x" returns 2:

	   $x = chr(400);
	   print "Length is ", length $x, "
";     # "Length is 1"
	   printf "Contents are %vd
", $x;	    # "Contents are 400"
	   {
	       use bytes; # or "require bytes; bytes::length()"
	       print "Length is ", length $x, "
"; # "Length is 2"
	       printf "Contents are %vd
", $x;     # "Contents are 198.144"
	   }

       chr(), ord(), substr(), index() and rindex() behave similarly.

       For more on the implications and differences between character semantics and byte semantics, see perluniintro and perlunicode.

LIMITATIONS

       bytes::substr() does not work as an lvalue().

SEE ALSO

       perluniintro, perlunicode, utf8

perl v5.14.2							    2010-12-30							      bytes(3perl)

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

splitting strings

Discussion started by: bensky

2. Programming

Splitting strings from file

Discussion started by: my_Perl

3. Shell Programming and Scripting

splitting words from a string

Discussion started by: suresh_kb211

4. Shell Programming and Scripting

Awk splitting words into files problem

Discussion started by: royalibrahim

5. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Discussion started by: gimley

6. Shell Programming and Scripting

Splitting concatenated words in input file with words from the same file

Discussion started by: gimley

7. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Discussion started by: Ophiuchus

8. Shell Programming and Scripting

awk Splitting strings

Discussion started by: booyaka

9. UNIX for Dummies Questions & Answers

Splitting strings

Discussion started by: verse123

10. UNIX for Dummies Questions & Answers

Splitting strings based on delimiter

Discussion started by: alpha_1

LEARN ABOUT DEBIAN

bytes