Sponsored Content
Top Forums Shell Programming and Scripting Merging words splitted into characters with awk Post 302502164 by dokamo on Monday 7th of March 2011 05:10:59 AM
Old 03-07-2011
Ok, this script matches my case:

Code:
awk '
BEGIN{
	q='"\"'\""'; 
	o=""
}
{
	for(i=1;i<=NF;i++)
	{
		f=$i;fl=length($i)
		nf=$(i+1);nfl=length($(i+1))
		if (nfl==0) {o=o f}
		else if(fl>1) {o=o f" "} 
		else if ((fl==1) && (nfl>1)) {o=o f" "}
		else if ((f~/[[:lower:]]/) && (nf~/[[:upper:]]/ || nf~/[[:digit:]]/ || nf~/[\(]/) ) {o=o f" "} 
		else if ((f~/[[:lower:]]/) && (nf~/[[:digit:]]/ ) ) {o=o f" "} 
		else if ((f~/[[:digit:]]/) && (nf~/[[:alpha:]]/)) {o=o f" "}
		else if ((f~/[[:punct:]]/) && (f!~/[\(\-]/ && f!=q && nf!~/[[:punct:]]/)) {o=o f" "}
		else {o=o f}
	} 
	o=o RS
	printf "%s", o > output.txt
	o=""
}
END{

}' input.txt

Of course this can't keep separated two lower/upper case words.
Fortunetly, my ocr text is full of punctuation and capitalizations, so that the result was good enougth for my aim.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

merging 2 lines with awk and stripping first two words

Hey all i am pretty new to awk... here my problem. My input is something like this: type: NSR client; name: pegasus; save set: /, /var, /part, /part/part2, /testpartition, /foo/bar,... (9 Replies)
Discussion started by: bazzed
9 Replies

2. Shell Programming and Scripting

Display text between two words/characters

Using sed or awk, I need to display text between two words/characters. Below are two example inputs and the desired output. In a nutshell, I need the date-range value between the quotes (but only the first occurance of date-range as there can be more than one). Example One Input: xml-report... (1 Reply)
Discussion started by: cmichaelson
1 Replies

3. Shell Programming and Scripting

Script for pulling words of 4 to 7 characters from a file

Even just advice on where to start would be helpful. Thank You (2 Replies)
Discussion started by: Azeus
2 Replies

4. Shell Programming and Scripting

deleting symbols and characters between two words

Hi Please tell me how could i delete symbols, whitespaces, characters, words everything between two words in a line. Let my file is aaa BB ccc ddd eee FF kkk xxx 123456 BB 44^& iop FF 999 xxx uuu rrr BB hhh nnn FF 000 I want to delete everything comes in between BB and FF( deletion... (3 Replies)
Discussion started by: rish_max
3 Replies

5. Shell Programming and Scripting

Urgent help needed on merging lines with similar words

Hi everyone, I need help with a merging problem. Basically, I have a file with several lines (in this example 9 lines) such as: Amie, Jay, Sasha, Rob, Kay Mia, Frank Jay, Nancy, Cecil Paul, Ked, Nancy, 17, Fred 14, 16, 18, 20 9, 11 12, Frank 18, Peter, 62 Nancy, 27 A delimiter is... (3 Replies)
Discussion started by: awb221
3 Replies

6. Shell Programming and Scripting

awk help needed in trying to count lines,words and characters

Hello, i am trying to write a script file in awk which yields me the number of lines,characters and words, i checked it many many times but i am not able to find any mistake in it. Please tell me where i went wrong. BEGIN{ print "Filename Lines Words Chars\n" } { filename=filename + 1... (2 Replies)
Discussion started by: salman4u
2 Replies

7. Shell Programming and Scripting

Need Header for all splitted files - awk

Input file: i have a file and need to split into multiple files based on first column. i need the header for all the splitted files. I'm unable to get the header. $ cat log.txt id,mailtype,value 1252468812,yahoo,3.5 1252468812,hotmail,2.4 1252468819,yahoo,1.2 1252468812,msn,8.9... (6 Replies)
Discussion started by: mannefromdetroi
6 Replies

8. Shell Programming and Scripting

Replace words with the first characters

Hello folks, I have a simple request but I can't find a simple solution. Hare is my problem. I have some dates, I need to replace months with only the first 3 characters (jan for january, feb for february, ... all in lower case) ~$ echo '3 october 2010' | sed 3 oct 2010I thought of something... (8 Replies)
Discussion started by: tukuyomi
8 Replies

9. Shell Programming and Scripting

Get characters between two words

Guys, Here is the txt file... SLIC N0SLU704034789 rŒ° EJ00 ó<NL DMRG>11 100 4B 2 SLIC N0SLU704034789 rŒ° TJ10 <4000><NL> 2 SLIC N0SLU704034789 ... (2 Replies)
Discussion started by: gowrishankar05
2 Replies

10. Shell Programming and Scripting

Need to extract characters between two search words in a script!!

Hi, I have a log file which is the output from a xml script : <?xml version="1.0" ?> <!DOCTYPE svc_result SYSTEM "MLP_SVC_RESULT_320.DTD"> <svc_result ver="3.2.0"> <slia ver="3.0.0"> <pos> <msid type="MSISDN" enc="ASC">8093078040</msid> <poserr> ... (4 Replies)
Discussion started by: arjunstarz
4 Replies
All times are GMT -4. The time now is 04:23 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy