Regex to identify a full-stop as a sentence delimiter


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Regex to identify a full-stop as a sentence delimiter
# 8  
Old 07-29-2012
Sorry my net was down and could not ack ur answer.
Many thanks for the script. The only hassle is that it needs to be a regex since I need to process data on the fly dynamically and not off-line using SED.
Any suggestions?
I did tweak your regex to suit my needs but drew a blank.
# 9  
Old 07-30-2012
Hi,

a regex will match something, then what ?
If don't understand what do you mean by a regex to process on the fly dynamically. .

Can you give me an exemple please ?

The goal even if it's "dynamic on the fly" is to replace the right full-stop by full-stop <new-line>

I don't get it how can you do that only with a regex ? are you using perl ?

perl :

Code:
perl -pe 's/\. ([A-Z])/.\n$1/g'

Code:
$ perl -pe 's/\. ([A-Z])/.\n$1/g' input-file
The temperature was 32.8 degrees Celsius.
His B.Sc. degree was deemed insufficient.
He owed the bank USD 4000.50 which he had not paid back.
On 27.07.2004 a major earthquake occurred.
It was 17.05 by the clock.

# 10  
Old 07-30-2012
Hi,
Many thanks for the regex. I will try it out and get back to you. By "on the fly", I meant that the regex is inserted within a java string which in turn interrogates a web-site and returns full sentences for searching and indexing.
This is why a Perl script would not help, since it would mean calling the script. I will try and see if the script can be called from Java, but the open source software we are using demands a regex and hence the request.
Many thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to identify delimiter to find and replace a string with sed?

I need to find and replace a date format in a SQL script with sed. The original lines are like this: ep.begin_date, ep.end_date, ep.facility_code, AND ep.begin_date <= '01-JUL-2019' ep.begin_date, ep.end_date, ep.facility_code, AND ... (15 Replies)
Discussion started by: duke0001
15 Replies

2. UNIX for Beginners Questions & Answers

Regex to identify pattern

Hi In a file I have string in multiple lines. Like below: <?=test.getObjectName("L", "testTBL","D") ?> <?=test.getObjectName("L", "testTBL","testDB", "D") ?> I want to use regex to search for the pattern "<?=test.getObjectName...?>" If the parenthesis has 3 parameters then return 2nd... (5 Replies)
Discussion started by: dashing201
5 Replies

3. Shell Programming and Scripting

Regex to identify illegal characters in a perso-arabic database

I am working on Sindhi: a perso-Arabic script and since it shares the Unicode-block with over 400 other languages, quite often the database contains characters which are not wanted: illegal characters. I have identified the character set of Sindhi which is given below: For clarity's sake, each... (8 Replies)
Discussion started by: gimley
8 Replies

4. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

5. Shell Programming and Scripting

Sentence delimiter in perl: modifications needed

Hello, I found this Perl Script on the EuroParl website which does Sentence Splitting. #!/usr/bin/perl -w # Based on Preprocessor written by Philipp Koehn binmode(STDIN, ":utf8"); binmode(STDOUT, ":utf8"); binmode(STDERR, ":utf8"); use FindBin qw($Bin); use strict; my $mydir =... (0 Replies)
Discussion started by: gimley
0 Replies

6. Shell Programming and Scripting

Identify full path in argument

I have a small script to send copies of files to another computer used for tests but in the same location:pwd=`pwd` for i in "$@" do echo "rcp -p $i comp-2:$pwd/$i" rcp -p $i comp-2:$pwd/$i echo "Finished with $i" doneIs there a way I can check the parameter to see if it is a full... (5 Replies)
Discussion started by: wbport
5 Replies

7. Shell Programming and Scripting

Regex to identify word in second position on a line

I am interested in finding a regex to find a word in second position on a line. The word in question is या I tried the following PERL EXPRESSION but it did not work: ] या or ^\W या But both gave Null results I am giving below a Sample file: देना या सौंपना=delegate तह जमना या... (8 Replies)
Discussion started by: gimley
8 Replies

8. UNIX for Dummies Questions & Answers

Use Regex to identify / format a complex string

First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem. I've got a string that looks something like this: Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article... (1 Reply)
Discussion started by: egill
1 Replies

9. Shell Programming and Scripting

How to take a full sentence and check the condition?

I have one input file and content of file is : --------------------------------------------------- Input.txt --------------------------------------------------- american express Bahnbau GmbH Bahnbau GmbH CRH Europe crh europe Helgeland Ferdigbetong AS... (8 Replies)
Discussion started by: humaemo
8 Replies

10. UNIX for Dummies Questions & Answers

Script to ask for a sentence and then count number of spaces in the sentence

Hi People, I need some Help to write a unix script that asks for a sentence to be typed out then with the sentence. Counts the number of spaces within the sentence and then echo's out "The Number Of Spaces In The Sentence is 4" as a example Thanks Danielle (12 Replies)
Discussion started by: charlie101208
12 Replies
Login or Register to Ask a Question