SED (or other) upper to lowercase, with first letter of first word in each sentence uppercase


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting SED (or other) upper to lowercase, with first letter of first word in each sentence uppercase
# 1  
Old 06-07-2012
SED (or other) upper to lowercase, with first letter of first word in each sentence uppercase

The title pretty much defines the problem. I have text files that are all in caps. I would like to convert them to lowercase, but have the first letter of the first word in each sentence in uppercase.

I already have SED on the server for fixing / tweaking text files, but I'm open to other solutions.
# 2  
Old 06-07-2012
Please post what Operating System and version you are running and most importantly in this case what Shell you use.

Please define a "sentence" in terms that a computer program would understand.

Please post sample before and after data, complete with an explanation of the process.

Last edited by methyl; 06-07-2012 at 05:37 PM..
# 3  
Old 06-07-2012
I'm using Centos 6.2 on a web server.
Shell/bin/bash
SED (or an alternative) would be run on a schedule using cron.

I understand "sentence" isn't a programming term - I simply mean the normal grammar usage (a number of words that ends with a period). Here is an example from a text file:

Quote:
AN UPPER RIDGE EXTENDS OVER THE W CARIBBEAN WITH THE AXIS FROM OVER CENTRAL AMERICA THROUGH THE GULF OF HONDURAS ACROSS W CUBA INTO THE W ATLC. A SECOND UPPER RIDGE EXTENDS FROM THE W TROPICAL ATLC OVER THE LESSER ANTILLES INDUCING AN INVERTED UPPER TROUGH OVER THE CARIBBEAN EXTENDING FROM THE MONA PASSAGE TO OVER COLOMBIA.
The desired output would look like this:

Quote:
An upper ridge extends over the w caribbean with the axis from over central america through the gulf of honduras across w cuba into the w atlc. A second upper ridge extends from the w tropical atlc over the lesser antilles inducing an inverted upper trough over the caribbean extending from the mona passage to over colombia.

Last edited by dockline; 06-07-2012 at 06:19 PM..
# 4  
Old 06-07-2012
IMHO this process is not within the capabilities of Shell programming.
The sample end-product is far from perfect because nouns and certain abbreviations are not capitalised. The spelling mistakes in the input data (e.g. "ATLC" - presumably "Atlantic") do not help.
Perhaps re-type the data rather than try to fix it with an impossible program?
# 5  
Old 06-07-2012
This data is updated several times a day, every day, so it must be automated. I can program SED to replace often used words and phrases with the right capitalization. Words like ATLC and Atlantic and Gulf, for example, are taken care of by this.

The problem is the first character of the first word in a sentence. Could a script change the first character after a single period to uppercase? The single period is an important point - often "..." is used with descriptions in this text to bridge copy.
# 6  
Old 06-07-2012
Code:
$ cat | sed 's/\./\.\n/g' | tr '[:upper:]' '[:lower:]' | sed 's/^ *//g' | perl -wp -pe '$_ = ucfirst' | tr '\n' ' '

Code:
cat | tr '[:upper:]' '[:lower:]' | sed -e 's/\./\.\n/g'  | sed -e 's/^ *//g' | sed -e 's/\(^[a-z]\)\([a-zA-Z0-9]*\)/\u\1\2/g' | tr '\n' ' '

AN UPPER RIDGE EXTENDS OVER THE W CARIBBEAN WITH THE AXIS FROM OVER CENTRAL AMERICA THROUGH THE GULF OF HONDURAS ACROSS W CUBA INTO THE W ATLC. A SECOND UPPER RIDGE EXTENDS FROM THE W TROPICAL ATLC OVER THE LESSER ANTILLES INDUCING AN INVERTED UPPER TROUGH OVER THE CARIBBEAN EXTENDING FROM THE MONA PASSAGE TO OVER COLOMBIA.
An upper ridge extends over the w caribbean with the axis from over central america through the gulf of honduras across w cuba into the w atlc. A second upper ridge extends from the w tropical atlc over the lesser antilles inducing an inverted upper trough over the caribbean extending from the mona passage to over colombia.

Code:
cat | tr '[:upper:]' '[:lower:]' | awk 'BEGIN {RS="."; ORS=". "} { {sub(".", substr(toupper($1),1,1) , $1)} print }'

AN UPPER RIDGE EXTENDS OVER THE W CARIBBEAN WITH THE AXIS FROM OVER CENTRAL AMERICA THROUGH THE GULF OF HONDURAS ACROSS W CUBA INTO THE W ATLC. A SECOND UPPER RIDGE EXTENDS FROM THE W TROPICAL ATLC OVER THE LESSER ANTILLES INDUCING AN INVERTED UPPER TROUGH OVER THE CARIBBEAN EXTENDING FROM THE MONA PASSAGE TO OVER COLOMBIA.

An upper ridge extends over the w caribbean with the axis from over central america through the gulf of honduras across w cuba into the w atlc. A second upper ridge extends from the w tropical atlc over the lesser antilles inducing an inverted upper trough over the caribbean extending from the mona passage to over colombia. .

Last edited by new_item; 06-07-2012 at 07:55 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Change first letter of a word from lower case to upper case

Hi all, I am trying to find a way to change first letter in a word from lower case to upper case. It should be done for each first word in text or in paragraph, and also for each word after punctuation like . ; : ! ?I found the following command sed -i 's/\s*./\U&\E/g' $@ filenamebut... (7 Replies)
Discussion started by: georgi58
7 Replies

2. Shell Programming and Scripting

Uppercase to lowercase

Hello, I have a list of files in a directory whose names are all in uppercasse, including the file format for eg *.MP3 . I would like to convert these to the normal way we write it ie ABC.MP3 to be converted to Abc.mp3 . I know that this can be done manually by using a lot of "mv" or rename... (6 Replies)
Discussion started by: ajayram
6 Replies

3. Shell Programming and Scripting

making the first character of word using uppercase using awk and sed

I want to make the first character of some words to be uppercase. I have a file like the one below. uid,givenname,sn,cn,mail,telephonenumber mattj,matt,johnson,matt johnson,mattj@gmail.com markv,mark,vennet,matt s vennet,markv@gmail.com mikea,mike,austi,mike austin,mike@gmail.com I want... (3 Replies)
Discussion started by: matt12
3 Replies

4. Shell Programming and Scripting

Convert to upper case first letter of each word in column 2

Hi guys, I have a file separated by ",". I´m trying to change to upper case the first letter of each word in column 2 to establish a standard format on this column. I hope somebody could help me to complete the SED or AWK script below. The file looks like this: (Some lines in column 2... (16 Replies)
Discussion started by: cgkmal
16 Replies

5. UNIX for Dummies Questions & Answers

UPPERCASE to lowercase

Hi All, i have a file and i want to convert all uppercase letters to lowercase letters which are in my file. how can i do this. Thanx (3 Replies)
Discussion started by: temhem
3 Replies

6. UNIX Desktop Questions & Answers

Unix: lowercase to uppercase

I just started to learn unix... and i needed to make a basic script. i need to 1. read a file (.txt) 2. count the words of EVERY sentece 3. sentences with odd number of words need to be converted into lowercase sentences with even number of words need to be converted into uppercase ... (6 Replies)
Discussion started by: chilli1988
6 Replies

7. Shell Programming and Scripting

sed help to convert from lowercase to uppercase and vice versa!

Hello, can sed be used to convert all letters of a file from uppercase to lowercase and vice versa?i know tr command can be used but with sed is it possible? i came up with this :- sed 'y///' file1 actually the above command is also not working! Please help me. Thanks in advance :) (6 Replies)
Discussion started by: salman4u
6 Replies

8. UNIX for Dummies Questions & Answers

uppercase to lowercase

i have no variable and no file i just want to convert AJIT to ajit with some command in UNIX can anybody help (4 Replies)
Discussion started by: ajit.yadav83
4 Replies

9. AIX

Lowercase to Uppercase

Inside a script I have 2 variables COMP=cy and PT=t. further down the same script I require at the same line to call those 2 variables the first time uppercase and after lowercase ${COMP}${PT}ACE,${COMP}${PT}ace. Can somebody help me Thanks in advance George Govotsis (7 Replies)
Discussion started by: ggovotsis
7 Replies

10. Shell Programming and Scripting

uppercase to lowercase

Greetings & Happy New Years To All! A client of mine FTP'ed their files up to the server and it all ended up being in UPPERCASE when it all should be in lowercase. Is there a builtin command or a script anyone knows of that will automagically convert all files to lowercase? Please advise asap... (4 Replies)
Discussion started by: webex
4 Replies
Login or Register to Ask a Question