Sponsored Content
Top Forums Shell Programming and Scripting Regex to identify a full-stop as a sentence delimiter Post 302678421 by gimley on Friday 27th of July 2012 11:27:30 PM
Old 07-28-2012
Regex to identify a full-stop as a sentence delimiter

Hello,
Splitting a sentence using the full-stop/question-mark/exclamation is a common device. Whereas the question-mark / exclamation do not pose too much of a problem; the full-stop as a sentence delimiter raises certain issues because of its varied use:
Quote:
The temperature was 32.8 degrees Celsius. (Temperature)
His B.Sc. degree was deemed insufficient. (Acronym)
He owed the bank USD 4000.50 which he had not paid back. (Currency)
On 27.07.2004 a major earthquake occurred. (Date)
It was 17.05 by the clock. (Time)
just to name a few.

Standard parsers such as the Stanford do not parse this correctlyand treat the full-stop as a delimiter whatever be its occurrence.
A Perl script would do the job, but since I am working on dynamic data where on the fly detection is needed, I am looking for a regex which can do the job and correctly ignore the above cases and identify only valid ones.
Use of close proximity i.e. ignore if between a full-stop and the next full-stop there are only a couple of words is a possibility but does not work in all cases.
Does anyone know of a solution to this thorny issue ? Many thanks in advance for your help
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Script to ask for a sentence and then count number of spaces in the sentence

Hi People, I need some Help to write a unix script that asks for a sentence to be typed out then with the sentence. Counts the number of spaces within the sentence and then echo's out "The Number Of Spaces In The Sentence is 4" as a example Thanks Danielle (12 Replies)
Discussion started by: charlie101208
12 Replies

2. Shell Programming and Scripting

How to take a full sentence and check the condition?

I have one input file and content of file is : --------------------------------------------------- Input.txt --------------------------------------------------- american express Bahnbau GmbH Bahnbau GmbH CRH Europe crh europe Helgeland Ferdigbetong AS... (8 Replies)
Discussion started by: humaemo
8 Replies

3. UNIX for Dummies Questions & Answers

Use Regex to identify / format a complex string

First of all, please have mercy on me. I am not a noob to programming, but I am about as noob as you can get with regex. That being said, I have a problem. I've got a string that looks something like this: Publication - Bob M. Jones, Tony X. Stark, and Fred D. Man, \"Really Awesome Article... (1 Reply)
Discussion started by: egill
1 Replies

4. Shell Programming and Scripting

Regex to identify word in second position on a line

I am interested in finding a regex to find a word in second position on a line. The word in question is या I tried the following PERL EXPRESSION but it did not work: ] या or ^\W या But both gave Null results I am giving below a Sample file: देना या सौंपना=delegate तह जमना या... (8 Replies)
Discussion started by: gimley
8 Replies

5. Shell Programming and Scripting

Identify full path in argument

I have a small script to send copies of files to another computer used for tests but in the same location:pwd=`pwd` for i in "$@" do echo "rcp -p $i comp-2:$pwd/$i" rcp -p $i comp-2:$pwd/$i echo "Finished with $i" doneIs there a way I can check the parameter to see if it is a full... (5 Replies)
Discussion started by: wbport
5 Replies

6. Shell Programming and Scripting

Sentence delimiter in perl: modifications needed

Hello, I found this Perl Script on the EuroParl website which does Sentence Splitting. #!/usr/bin/perl -w # Based on Preprocessor written by Philipp Koehn binmode(STDIN, ":utf8"); binmode(STDOUT, ":utf8"); binmode(STDERR, ":utf8"); use FindBin qw($Bin); use strict; my $mydir =... (0 Replies)
Discussion started by: gimley
0 Replies

7. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

8. Shell Programming and Scripting

Regex to identify illegal characters in a perso-arabic database

I am working on Sindhi: a perso-Arabic script and since it shares the Unicode-block with over 400 other languages, quite often the database contains characters which are not wanted: illegal characters. I have identified the character set of Sindhi which is given below: For clarity's sake, each... (8 Replies)
Discussion started by: gimley
8 Replies

9. UNIX for Beginners Questions & Answers

Regex to identify pattern

Hi In a file I have string in multiple lines. Like below: <?=test.getObjectName("L", "testTBL","D") ?> <?=test.getObjectName("L", "testTBL","testDB", "D") ?> I want to use regex to search for the pattern "<?=test.getObjectName...?>" If the parenthesis has 3 parameters then return 2nd... (5 Replies)
Discussion started by: dashing201
5 Replies

10. UNIX for Beginners Questions & Answers

How to identify delimiter to find and replace a string with sed?

I need to find and replace a date format in a SQL script with sed. The original lines are like this: ep.begin_date, ep.end_date, ep.facility_code, AND ep.begin_date <= '01-JUL-2019' ep.begin_date, ep.end_date, ep.facility_code, AND ... (15 Replies)
Discussion started by: duke0001
15 Replies
nmea(n) 						   NMEA protocol implementation 						   nmea(n)

__________________________________________________________________________________________________________________________________________________

NAME
nmea - Process NMEA data SYNOPSIS
package require Tcl 8.2 package require nmea ?0.1.1? ::nmea::open_port port ?speed? ::nmea::open_file file rate ::nmea::input sentence ::nmea::configure_port settings ::nmea::close_port ::nmea::close_file ::nmea::do_line ::nmea::log file ::nmea::checksum data ::nmea::write sentence data _________________________________________________________________ DESCRIPTION
This package provides a standard interface for writing software which recieves NMEA standard input data. It allows for reading data from COM ports, files, or programmatic input. It also supports the checksumming and logging of incoming data. After parsing, input is dis- patched to user defined handler commands for processing. To define a handler, create a proc with the NMEA sentence name in the ::nmea namespace. For example, to process GPS fix data use "proc ::nmea::GPGSA". The proc must take one argument, which is a list of the data val- ues. COMMANDS
::nmea::open_port port ?speed? Open the specified COM port and read NMEA sentences when available. Port speed is set to 4800bps by default or to speed. ::nmea::open_file file rate Open file file and read NMEA sentences, one per line, at the rate by rate in milliseconds. The file format may omit the leading $ and/or the checksum. If rate is <= 0 then lines will only be processed when a call to do_line is made. The rate may be adjusted by setting ::nmea::nmea(rate). ::nmea::input sentence Processes and dispatches the supplied sentence. If sentence contains no commas it is treated as a Tcl list, otherwise it must be standard comma delimited NMEA data, with an optional checksum and leading $. ::nmea::configure_port settings Changes the current port settings. settings has the same format as fconfigure -mode. ::nmea::close_port Close the open port ::nmea::close_file Close the open file ::nmea::do_line If there is a currently open file, this command will read and process a single line from it. Returns the number of lines read. ::nmea::log file Starts or stops file logging. If a file name is specified then all NMEA output will be logged to the file in append mode. If file is an empty string then any logging will be stopped. ::nmea::checksum data Returns the checksum of the supplied data ::nmea::write sentence data If there is a currently open port, this command will write the specified sentence and data in proper NMEA checksummed format. VARIABLES
::nmea::checksum A boolean value which determines whether incoming sentences are validated or not. ::nmea::rate When reading from a file this sets the rate that lines are processed in milliseconds. BUGS, IDEAS, FEEDBACK This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category nmea of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation. KEYWORDS
gps, nmea COPYRIGHT
Copyright (c) 2006-2007, Aaron Faupell <afaupell@users.sourceforge.net> nmea 0.1 nmea(n)
All times are GMT -4. The time now is 05:34 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy