Sponsored Content
Top Forums Shell Programming and Scripting Merging words splitted into characters with awk Post 302496151 by dokamo on Saturday 12th of February 2011 12:53:32 PM
Old 02-12-2011
Merging words splitted into characters with awk

I have an OCR output with some words splitted into single characters separated by blank spaces,
and I want the same text with these words written correctly.

Example:
This is a text w i t h some s p l i t e d W o r d s .

The regular expression for matching splitted words could be something like this (I'm not so much worried about that):
Code:
grep -E "([A-Z])?( [a-z]){2,100} [.,]?"

My question is:
Once I've matched the string, how can I delete this annoing blank spaces?
I tried with the awk gsub and gensub functions but I'm not so hard with this.

I just reach to do this:
Code:
awk '{ S=gensub(" ([a-z]) ([a-z]) ([a-z]) ", " \\1\\2\\3 ", "g", $0); print S}'

That is not sufficient at all: the number of splitted characters is always different (not 3!)

Which is the right way to do this? any help? Thanks
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

merging 2 lines with awk and stripping first two words

Hey all i am pretty new to awk... here my problem. My input is something like this: type: NSR client; name: pegasus; save set: /, /var, /part, /part/part2, /testpartition, /foo/bar,... (9 Replies)
Discussion started by: bazzed
9 Replies

2. Shell Programming and Scripting

Display text between two words/characters

Using sed or awk, I need to display text between two words/characters. Below are two example inputs and the desired output. In a nutshell, I need the date-range value between the quotes (but only the first occurance of date-range as there can be more than one). Example One Input: xml-report... (1 Reply)
Discussion started by: cmichaelson
1 Replies

3. Shell Programming and Scripting

Script for pulling words of 4 to 7 characters from a file

Even just advice on where to start would be helpful. Thank You (2 Replies)
Discussion started by: Azeus
2 Replies

4. Shell Programming and Scripting

deleting symbols and characters between two words

Hi Please tell me how could i delete symbols, whitespaces, characters, words everything between two words in a line. Let my file is aaa BB ccc ddd eee FF kkk xxx 123456 BB 44^& iop FF 999 xxx uuu rrr BB hhh nnn FF 000 I want to delete everything comes in between BB and FF( deletion... (3 Replies)
Discussion started by: rish_max
3 Replies

5. Shell Programming and Scripting

Urgent help needed on merging lines with similar words

Hi everyone, I need help with a merging problem. Basically, I have a file with several lines (in this example 9 lines) such as: Amie, Jay, Sasha, Rob, Kay Mia, Frank Jay, Nancy, Cecil Paul, Ked, Nancy, 17, Fred 14, 16, 18, 20 9, 11 12, Frank 18, Peter, 62 Nancy, 27 A delimiter is... (3 Replies)
Discussion started by: awb221
3 Replies

6. Shell Programming and Scripting

awk help needed in trying to count lines,words and characters

Hello, i am trying to write a script file in awk which yields me the number of lines,characters and words, i checked it many many times but i am not able to find any mistake in it. Please tell me where i went wrong. BEGIN{ print "Filename Lines Words Chars\n" } { filename=filename + 1... (2 Replies)
Discussion started by: salman4u
2 Replies

7. Shell Programming and Scripting

Need Header for all splitted files - awk

Input file: i have a file and need to split into multiple files based on first column. i need the header for all the splitted files. I'm unable to get the header. $ cat log.txt id,mailtype,value 1252468812,yahoo,3.5 1252468812,hotmail,2.4 1252468819,yahoo,1.2 1252468812,msn,8.9... (6 Replies)
Discussion started by: mannefromdetroi
6 Replies

8. Shell Programming and Scripting

Replace words with the first characters

Hello folks, I have a simple request but I can't find a simple solution. Hare is my problem. I have some dates, I need to replace months with only the first 3 characters (jan for january, feb for february, ... all in lower case) ~$ echo '3 october 2010' | sed 3 oct 2010I thought of something... (8 Replies)
Discussion started by: tukuyomi
8 Replies

9. Shell Programming and Scripting

Get characters between two words

Guys, Here is the txt file... SLIC N0SLU704034789 rŒ° EJ00 ó<NL DMRG>11 100 4B 2 SLIC N0SLU704034789 rŒ° TJ10 <4000><NL> 2 SLIC N0SLU704034789 ... (2 Replies)
Discussion started by: gowrishankar05
2 Replies

10. Shell Programming and Scripting

Need to extract characters between two search words in a script!!

Hi, I have a log file which is the output from a xml script : <?xml version="1.0" ?> <!DOCTYPE svc_result SYSTEM "MLP_SVC_RESULT_320.DTD"> <svc_result ver="3.2.0"> <slia ver="3.0.0"> <pos> <msid type="MSISDN" enc="ASC">8093078040</msid> <poserr> ... (4 Replies)
Discussion started by: arjunstarz
4 Replies
STRSPLIT(3pub)						       C Programmer's Manual						    STRSPLIT(3pub)

NAME
strsplit - split string into words SYNOPSIS
#include <publib.h> int strsplit(char *src, char **words, int maxw, const char *sep); DESCRIPTION
strsplit splits the src string into words separated by one or more of the characters in sep (or by whitespace characters, as specified by isspace(3), if sep is the empty string). Pointers to the words are stored in successive elements in the array pointed to by words. No more than maxw pointers are stored. The input string is modifed by replacing the separator character following a word with ''. However, if there are more than maxw words, only maxw-1 words will be returned, and the maxwth pointer in the array will point to the rest of the string. If maxw is 0, no modification is done. This can be used for counting how many words there are, e.g., so that space for the word pointer table can be allocated dynamically. strsplit splits the src string into words separated by one or more of the characters in sep (or by whitespace characters, as defined by isspace(3), if sep is the empty string). The src string is modified by replacing the separator character after each word with ''. A pointer to each word is stored into successive elements of the array words. If there are more than maxw words, a '' is stored after the first maxw-1 words only, and the words[maxw-1] will contain a pointer to the rest of the string after the word in words[maxw-2]. RETURN VALUE
strsplit returns the total number of words in the input string. EXAMPLE
Assuming that words are separated by white space, to count the number of words on a line, one might say the following. n = strsplit(line, NULL, 0, ""); To print out the fields of a colon-separated list (such as PATH, or a line from /etc/passwd or /etc/group), one might do the following. char *fields[15]; int i, n; n = strsplit(list, fields, 15, ":"); if (n > 15) n = 15; for (i = 0; i < n; ++i) printf("field %d: %s ", i, fields[i]); In real life, one would of course prefer to not restrict the number of fields, so one might either allocated the pointer table dynamically (first counting the number of words using something like the first example), or realize that since it is the original string that is being modified, one can do the following: char *fields[15]; int i, n; do { n = strsplit(list, fields, 15, ":"); if (n > 15) n = 15; for (i = 0; i < n; ++i) printf("field %d: %s ", i, fields[i]); list = field[n-1] + strlen(field[n-1]); } while (n == 15); SEE ALSO
publib(3), strtok(3) AUTHOR
The idea for this function came from C-News source code by Henry Spencer and Geoff Collyer. Their function is very similar, but this implementation is by Lars Wirzenius (lars.wirzenius@helsinki.fi) Publib C Programmer's Manual STRSPLIT(3pub)
All times are GMT -4. The time now is 04:03 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy