07-28-2012
Regex to identify a full-stop as a sentence delimiter
Hello,
Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use:
Quote:
The temperature was 32.8 degrees Celsius. (Temperature)
His B.Sc. degree was deemed insufficient. (Acronym)
He owed the bank USD 4000.50 which he had not paid back. (Currency)
On 27.07.2004 a major earthquake occurred. (Date)
It was 17.05 by the clock. (Time)
just to name a few.
Standard parsers such as the Stanford do not parse this correctlyand treat the full-stop as a delimiter whatever be its occurrence.
A Perl script would do the job, but since I am working on dynamic data where on the fly detection is needed, I am looking for a regex which can do the job and correctly ignore the above cases and identify only valid ones.
Use of close proximity i.e. ignore if between a full-stop and the next full-stop there are only a couple of words is a possibility but does not work in all cases.
Does anyone know of a solution to this thorny issue ? Many thanks in advance for your help
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
Hi People,
I need some Help to write a unix script that asks for a sentence to be typed out then with the sentence. Counts the number of spaces within the sentence and then echo's out "The Number Of Spaces In The Sentence is 4" as a example
Thanks
Danielle (12 Replies)
Discussion started by: charlie101208
12 Replies
2. Shell Programming and Scripting
I have one input file and content of file is :
---------------------------------------------------
Input.txt
---------------------------------------------------
american express
Bahnbau GmbH
Bahnbau GmbH
CRH Europe
crh europe
Helgeland Ferdigbetong AS... (8 Replies)
Discussion started by: humaemo
8 Replies
3. UNIX for Dummies Questions & Answers
First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem.
I've got a string that looks something like this:
Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article... (1 Reply)
Discussion started by: egill
1 Replies
4. Shell Programming and Scripting
I am interested in finding a regex to find a word in second position on a line. The word in question is या
I tried the following PERL EXPRESSION but it did not work:
] या
or
^\W या
But both gave Null results
I am giving below a Sample file:
देना या सौंपना=delegate
तह जमना या... (8 Replies)
Discussion started by: gimley
8 Replies
5. Shell Programming and Scripting
I have a small script to send copies of files to another computer used for tests but in the same location:pwd=`pwd`
for i in "$@"
do
echo "rcp -p $i comp-2:$pwd/$i"
rcp -p $i comp-2:$pwd/$i
echo "Finished with $i"
doneIs there a way I can check the parameter to see if it is a full... (5 Replies)
Discussion started by: wbport
5 Replies
6. Shell Programming and Scripting
Hello,
I found this Perl Script on the EuroParl website which does Sentence Splitting.
#!/usr/bin/perl -w
# Based on Preprocessor written by Philipp Koehn
binmode(STDIN, ":utf8");
binmode(STDOUT, ":utf8");
binmode(STDERR, ":utf8");
use FindBin qw($Bin);
use strict;
my $mydir =... (0 Replies)
Discussion started by: gimley
0 Replies
7. Shell Programming and Scripting
Hello,
I have a dictionary which I am building for the Open Source Community. The data structure is as under
HEADWORD=PARTOFSPEECH=ENGLISH MEANING
as shown in the example below
अ=m=Prefix signifying negation.
अँहँ=ind=Interjection expressing disapprobation.
अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies
8. Shell Programming and Scripting
I am working on Sindhi: a perso-Arabic script and since it shares the Unicode-block with over 400 other languages, quite often the database contains characters which are not wanted: illegal characters.
I have identified the character set of Sindhi which is given below:
For clarity's sake, each... (8 Replies)
Discussion started by: gimley
8 Replies
9. UNIX for Beginners Questions & Answers
Hi
In a file I have string in multiple lines. Like below:
<?=test.getObjectName("L", "testTBL","D") ?>
<?=test.getObjectName("L", "testTBL","testDB", "D") ?>
I want to use regex to search for the pattern "<?=test.getObjectName...?>"
If the parenthesis has 3 parameters then return 2nd... (5 Replies)
Discussion started by: dashing201
5 Replies
10. UNIX for Beginners Questions & Answers
I need to find and replace a date format in a SQL script with sed. The original lines are like this:
ep.begin_date, ep.end_date, ep.facility_code,
AND ep.begin_date <= '01-JUL-2019'
ep.begin_date, ep.end_date, ep.facility_code,
AND ... (15 Replies)
Discussion started by: duke0001
15 Replies
qstop(8B) PBS qstop(8B)
NAME
qstop - stop pbs batch job processing at a destination
SYNOPSIS
qstop destination ...
DESCRIPTION
The qstop command directs that a destination should stop processing batch jobs. If the destination is a execution queue, the server will
cease scheduling jobs that reside in the queue for execution. If the destination is a routing queue, the server will cease routing jobs
from that queue.
In order to execute qstop, the user must have PBS Operation or Manager privilege.
OPERANDS
The qstop command accepts one or more destination operands. The operands are one of three forms:
queue
@server
queue@server
If queue is specified, the request is to stop that queue at the default server. If the @server form is given, the request is to stop all
the queues at that server. If a full destination identifier, queue@server, is given, the request is to stop the named queue at the named
server.
STANDARD ERROR
The qstop command will write a diagnostic message to standard error for each error occurrence.
EXIT STATUS
Upon successful processing of all the operands presented to the qstop command, the exit status will be a value of zero.
If the qstop command fails to process any operand, the command exits with a value greater than zero.
SEE ALSO
pbs_server(8B), qstart(8B), and qmgr(1B)
Local qstop(8B)