Sponsored Content
Top Forums Shell Programming and Scripting Merging words splitted into characters with awk Post 302496246 by cgkmal on Sunday 13th of February 2011 03:25:31 AM
Old 02-13-2011
dokamo,

Working based on your input example, the better solution I get so far I´ve divided in 4 sed parts for better understanding, you can try the "echo" followed by one sed command at a time to see what it does each one.

The problem is when a splitted word is followed by another splitted word, in this case, in the output, both words appear joined.

If it is close what you want, you only need to join 4 sed parts in a unique sed command.

Code:
echo " This is a text w i t h some s p l i t e d W o r d s ." | 
sed 's/\([a-z][a-z]?*\)\( \)/\1|/g' | 
sed 's/\([a-z]\)\( \)\([a-z][a-z]\)/\1|\3/g' | 
sed 's/ //g' | 
sed 's/|/ /g'
This is a text with some splitedWords.

Hope it helps.

Regards
This User Gave Thanks to cgkmal For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

merging 2 lines with awk and stripping first two words

Hey all i am pretty new to awk... here my problem. My input is something like this: type: NSR client; name: pegasus; save set: /, /var, /part, /part/part2, /testpartition, /foo/bar,... (9 Replies)
Discussion started by: bazzed
9 Replies

2. Shell Programming and Scripting

Display text between two words/characters

Using sed or awk, I need to display text between two words/characters. Below are two example inputs and the desired output. In a nutshell, I need the date-range value between the quotes (but only the first occurance of date-range as there can be more than one). Example One Input: xml-report... (1 Reply)
Discussion started by: cmichaelson
1 Replies

3. Shell Programming and Scripting

Script for pulling words of 4 to 7 characters from a file

Even just advice on where to start would be helpful. Thank You (2 Replies)
Discussion started by: Azeus
2 Replies

4. Shell Programming and Scripting

deleting symbols and characters between two words

Hi Please tell me how could i delete symbols, whitespaces, characters, words everything between two words in a line. Let my file is aaa BB ccc ddd eee FF kkk xxx 123456 BB 44^& iop FF 999 xxx uuu rrr BB hhh nnn FF 000 I want to delete everything comes in between BB and FF( deletion... (3 Replies)
Discussion started by: rish_max
3 Replies

5. Shell Programming and Scripting

Urgent help needed on merging lines with similar words

Hi everyone, I need help with a merging problem. Basically, I have a file with several lines (in this example 9 lines) such as: Amie, Jay, Sasha, Rob, Kay Mia, Frank Jay, Nancy, Cecil Paul, Ked, Nancy, 17, Fred 14, 16, 18, 20 9, 11 12, Frank 18, Peter, 62 Nancy, 27 A delimiter is... (3 Replies)
Discussion started by: awb221
3 Replies

6. Shell Programming and Scripting

awk help needed in trying to count lines,words and characters

Hello, i am trying to write a script file in awk which yields me the number of lines,characters and words, i checked it many many times but i am not able to find any mistake in it. Please tell me where i went wrong. BEGIN{ print "Filename Lines Words Chars\n" } { filename=filename + 1... (2 Replies)
Discussion started by: salman4u
2 Replies

7. Shell Programming and Scripting

Need Header for all splitted files - awk

Input file: i have a file and need to split into multiple files based on first column. i need the header for all the splitted files. I'm unable to get the header. $ cat log.txt id,mailtype,value 1252468812,yahoo,3.5 1252468812,hotmail,2.4 1252468819,yahoo,1.2 1252468812,msn,8.9... (6 Replies)
Discussion started by: mannefromdetroi
6 Replies

8. Shell Programming and Scripting

Replace words with the first characters

Hello folks, I have a simple request but I can't find a simple solution. Hare is my problem. I have some dates, I need to replace months with only the first 3 characters (jan for january, feb for february, ... all in lower case) ~$ echo '3 october 2010' | sed 3 oct 2010I thought of something... (8 Replies)
Discussion started by: tukuyomi
8 Replies

9. Shell Programming and Scripting

Get characters between two words

Guys, Here is the txt file... SLIC N0SLU704034789 rŒ° EJ00 ó<NL DMRG>11 100 4B 2 SLIC N0SLU704034789 rŒ° TJ10 <4000><NL> 2 SLIC N0SLU704034789 ... (2 Replies)
Discussion started by: gowrishankar05
2 Replies

10. Shell Programming and Scripting

Need to extract characters between two search words in a script!!

Hi, I have a log file which is the output from a xml script : <?xml version="1.0" ?> <!DOCTYPE svc_result SYSTEM "MLP_SVC_RESULT_320.DTD"> <svc_result ver="3.2.0"> <slia ver="3.0.0"> <pos> <msid type="MSISDN" enc="ASC">8093078040</msid> <poserr> ... (4 Replies)
Discussion started by: arjunstarz
4 Replies
indxbib(1)							   User Commands							indxbib(1)

NAME
indxbib - create an inverted index to a bibliographic database SYNOPSIS
indxbib database-file... DESCRIPTION
indxbib makes an inverted index to the named database-file (which must reside within the current directory), typically for use by look- bib(1) and refer(1). A database contains bibliographic references (or other kinds of information) separated by blank lines. A bibliographic reference is a set of lines, constituting fields of bibliographic information. Each field starts on a line beginning with a `%', followed by a key-letter, then a blank, and finally the contents of the field, which may continue until the next line starting with `%'. indxbib is a shell script that calls two programs: /usr/lib/refer/mkey and /usr/lib/refer/inv. mkey truncates words to 6 characters, and maps upper case to lower case. It also discards words shorter than 3 characters, words among the 100 most common English words, and num- bers (dates) < 1000 or > 2099. These parameters can be changed. indxbib creates an entry file (with a .ia suffix), a posting file (.ib), and a tag file (.ic), in the working directory. FILES
/usr/lib/refer/mkey /usr/lib/refer/inv x.ia entry file x.ib posting file x.ic tag file x.ig reference file ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | |Availability |SUNWdoc | +-----------------------------+-----------------------------+ SEE ALSO
addbib(1), lookbib(1), refer(1), roffbib(1), sortbib(1), attributes(5) BUGS
All dates should probably be indexed, since many disciplines refer to literature written in the 1800s or earlier. indxbib does not recognize pathnames. SunOS 5.10 14 Sep 1992 indxbib(1)
All times are GMT -4. The time now is 12:37 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy