Sponsored Content
Top Forums Shell Programming and Scripting Splitting Concatenated Words With Largest Strings First Post 302511480 by Chubler_XL on Thursday 7th of April 2011 01:12:54 AM
Old 04-07-2011
lol, I remember mentioning this exact problem in this post nearly 2 months ago.

The best way to tackle this is to try all possible substrings and select solution with smallest number of residual characters. Let me try and throw something together.

---------- Post updated at 02:36 PM ---------- Previous update was at 01:52 PM ----------

OK have a solution but it's much slower as it has to try all possible combinations:

Code:
awk 'NR==FNR{a[$1]; next}
function clean(s) {
   split(cs(s),S,SUBSEP);
   return S[1]
}
function cs(s,i,p,r,b,bs,t) {
 b=9999;
 for(i=length(s);b && i;i--) {
   r=0;
   p=tolower(substr(s,1,i));
   if(!(p in a)) r=i;
   t=cs(substr(s,i+1));
   split(t,V,SUBSEP);
   if(r+V[2]<b) { b=r+V[2]; bs=substr(s,1,i)" "V[1] SUBSEP b }
  }
  return bs;
}
{ print clean($0) }' lookup raw

---------- Post updated at 03:12 PM ---------- Previous update was at 02:36 PM ----------

Couple of Performance improvement
- No need to check strings longer than longest word
- Skip if current mismatch is worse than best found so far

Code:
awk 'NR==FNR{a[$1]; m=m<length?length:m; next}
function clean(s) {
   split(cs(s),S,SUBSEP);
   return S[1]
}
function cs(s,i,p,r,b,bs,t) {
 b=9999;
 for(i=length(s)>m?m:length(s);b && i;i--) {
   r=0;
   p=tolower(substr(s,1,i));
   if(!(p in a)) r=i;
   if(r<b) {
     t=cs(substr(s,i+1));
     split(t,V,SUBSEP);
     if(r+V[2]<b) { b=r+V[2]; bs=substr(s,1,i)" "V[1] SUBSEP b }
   }
  }
  return bs;
}
{ print clean($0) }'


Last edited by Chubler_XL; 04-07-2011 at 01:57 AM..
This User Gave Thanks to Chubler_XL For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

splitting strings

Hi you, I have the following problem: I have a string like the followings: '166Mhz' or '128MB' or '300sec' or ... What I want to do is, I want to split the strings in a part with the numbers and a part with letters. Since the strings are not allway three digits and than text i couldn't do... (3 Replies)
Discussion started by: bensky
3 Replies

2. Programming

Splitting strings from file

Hi All I need help writing a Java program to split strings reading from a FILE and writing output into a FILE. e.g., My input is : International NNP Rockwell NNP Corp. NNP 's POS Tulsa NNP unit NN said VBDExpected output is: International I In Int Inte l al... (2 Replies)
Discussion started by: my_Perl
2 Replies

3. Shell Programming and Scripting

splitting words from a string

Hi, I have a string like this in a file, I want to retrive the words separated by comma's in 3 variables. like How do i get that.plz advice (2 Replies)
Discussion started by: suresh_kb211
2 Replies

4. Shell Programming and Scripting

Awk splitting words into files problem

Hi, I am trying to split the words having the delimiter as colon ';' in to separate files using awk. Here's my code. echo "f1;f2;f3" | awk '/;/{c=sprintf("%02d",++i); close("out" c)} {print > "out" c}' echo "f1;f2;f3" | awk -v i=0 '/;/{close("out"i); i++; next} {print > "out"i}' But... (4 Replies)
Discussion started by: royalibrahim
4 Replies

5. Shell Programming and Scripting

Splitting Concatenated Words in Input File with Words from a Master File

Hello, I have a complex problem. I have a file in which words have been joined together: Theboy ranslowly I want to be able to correctly split the words using a lookup file in which all the words occur: the boy ran slowly slow put child ly The lookup file which is meant for look up... (21 Replies)
Discussion started by: gimley
21 Replies

6. Shell Programming and Scripting

Splitting concatenated words in input file with words from the same file

Dear all, I am working with names and I have a large file of names in which some words are written together (upto 4 or 5) and their corresponding single forms are also present in the word-list. An example would make this clear annamarie mariechristine johnsmith johnjoseph smith john smith... (8 Replies)
Discussion started by: gimley
8 Replies

7. Shell Programming and Scripting

Print only lines where fields concatenated match strings

Hello everyone, Maybe somebody could help me with an awk script. I have this input (field separator is comma ","): 547894982,M|N|J,U|Q|P,98,101,0,1,1 234900027,M|N|J,U|Q|P,98,101,0,1,1 234900023,M|N|J,U|Q|P,98,54,3,1,1 234900028,M|H|J,S|Q|P,98,101,0,1,1 234900030,M|N|J,U|F|P,98,101,0,1,1... (2 Replies)
Discussion started by: Ophiuchus
2 Replies

8. Shell Programming and Scripting

awk Splitting strings

Hi All, There is a file with a data. If the line is longer than 'n', we splitting the line on the parts and print them. Each of the parts is less than or equal 'n'. For example: n = 2; "ABCDEFGHIJK" -> length 11 Results: "AB" "CD" EF" GH" "IJ" "K" Code, but there are some errors.... (9 Replies)
Discussion started by: booyaka
9 Replies

9. UNIX for Dummies Questions & Answers

Splitting strings

I have a file that has two columns. I first column is an identifier and the second is a column of strings. I want to split the characters in the second column into substrings of length 5. So if the first line of the file has a string of length 10, the output should have the identifier repeated 2... (3 Replies)
Discussion started by: verse123
3 Replies

10. UNIX for Dummies Questions & Answers

Splitting strings based on delimiter

i have a snippet from server log delimited by forward slash. /a/b/c/d/filename i need to cut until last delimiter. So desired output should look like: /a/b/c/d can you please help? Thanks in advance. (7 Replies)
Discussion started by: alpha_1
7 Replies
ppmtosixel(1)						      General Commands Manual						     ppmtosixel(1)

NAME
ppmtosixel - convert a portable pixmap into DEC sixel format SYNOPSIS
ppmtosixel [-raw] [-margin] [ppmfile] DESCRIPTION
Reads a portable pixmap as input. Produces sixel commands (SIX) as output. The output is formatted for color printing, e.g. for a DEC LJ250 color inkjet printer. If RGB values from the PPM file do not have maxval=100, the RGB values are rescaled. A printer control header and a color assignment table begin the SIX file. Image data is written in a compressed format by default. A printer control footer ends the image file. OPTIONS
-raw If specified, each pixel will be explicitly described in the image file. If -raw is not specified, output will default to com- pressed format in which identical adjacent pixels are replaced by "repeat pixel" commands. A raw file is often an order of magni- tude larger than a compressed file and prints much slower. -margin If -margin is not specified, the image will be start at the left margin (of the window, paper, or whatever). If -margin is speci- fied, a 1.5 inch left margin will offset the image. PRINTING
Generally, sixel files must reach the printer unfiltered. Use the lpr -x option or cat filename > /dev/tty0?. BUGS
Upon rescaling, truncation of the least significant bits of RGB values may result in poor color conversion. If the original PPM maxval was greater than 100, rescaling also reduces the image depth. While the actual RGB values from the ppm file are more or less retained, the color palette of the LJ250 may not match the colors on your screen. This seems to be a printer limitation. SEE ALSO
ppm(5) AUTHOR
Copyright (C) 1991 by Rick Vinci. 26 April 1991 ppmtosixel(1)
All times are GMT -4. The time now is 05:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy