Please post relevant text on this site, not external sites. The external site will age out the text post, then some future searcher will not have a clue as what this thread is really about.
In fact it went to lala land (404) just now..... Nobody can effectively help you now.
Thank you.
I'm going to start a new thread with sample text and simpler request. Thank you for the guidance.
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
A perl approach
Hi.
This is a perl approach to this problem. One of the modules at CPAN is Sentence. I won't post the less-than-40-line perl code, p1, unless necessary. Here is a sample use on a small data file:
producing:
For the 60957 lines in the posted link, it found 31017 sentences in 260 seconds, so it's not the fastest code, but it seems to get the job done.
Obviously this of little value if the OP desires awk, although the regular expression might be able to be used, along with the algorithm of the perl module of marking the possible sentences, and then checking for exceptions like the list of known abbreviations.
Hello all
I am doing a Makefile but I can't return the value of $var to use it in conditional sentences:
#!/bin/sh
GO=$(shell) go
GOPATH=$(GO) env GOPATH
make:
@$(GOPATH)
@if ; then mkdir -p "$(GOPATH)/bin" ; fi
When I type "make", @$GOPATH returns /home/icvallejo/go... (5 Replies)
Hi,
I need an awk to modify the following file. It is 2-column tab-separated.
Hi PP
my VBD
name DT
is NN
. SENT
Her PP
name VBD
is DT
the NN
same WRT
. SENT
<s>
Hi PP - (6 Replies)
Hi,
I have few sentences here.
$a1="Division of Hematology-Oncology, and Stem cell transplantation, Schneider Childrens Hospital, Albert Einstein College of Medicine, New Hyde Park, New York. ";
$a2="Department of Cell Biology and Anatomy, College of Medicine, National Cheng Kung... (3 Replies)
Hi,
I have a file and that file contains the following sentences.
Here we show that a virus-encoded transcription factor, viral mRNA, cellular RNA-binding protein heterodimer G3BP/Caprin-1 (p137), translation initiation factors eIF4E and eIF4G, and ribosomal proteins are concentrated in the... (4 Replies)
Hi,
I have sentences like this:
$sent=
Protein modeling studies reveal that the RG-rich region is part of a three to four strand antiparallel beta-sheet, which in other RNA binding protein functions as a platform for nucleic acid interactions.
Heterogeneous nuclear ribonucleoparticle... (19 Replies)
Hi,
I have to identify sentences from this text.
If i split these statements by this way:
@sentence= split(/\.\W*/,$text);
I will get these following things also in the output along with proper sentences.
Biol Reprod.
2002 Mar;66(3):785-95.
Egydio de Carvalho C, Tanaka H,... (2 Replies)
Hi guys,i got this problem which is..i need to find those sentences with date inside and extract them out,the input is somehow like this
eg:
$DATA42.GANTRY2.GA161147 DISKFILE 2007-10-16 11:56:45 SUPER.OPR \NETS.$Y4CB.#IN
... (4 Replies)