How to identify sentences from a text?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to identify sentences from a text?
# 1  
Old 07-17-2008
How to identify sentences from a text?

Hi,

I have to identify sentences from this text.

If i split these statements by this way:

Code:
@sentence= split(/\.\W*/,$text);

I will get these following things also in the output along with proper sentences.

Biol Reprod.

2002 Mar;66(3):785-95.

Egydio de Carvalho C, Tanaka H, Iguchi N, Ventela S, Nojima H, Nishimune Y.

Department of Science for Laboratory Animal Experimentation, Research Institutefor Microbial Diseases, Osaka University, Suita City, Osaka 565-0871, Japan.

Research Support, Non-U.S.

I should get proper sentences only.

How can i identify proper sentences in perl?

I don't want to use any modules without using modules can we do this?

Here is the text:

1: Biol Reprod. 2002 Mar;66(3):785-95.

Molecular cloning and characterization of a complementary DNA encoding sperm tail
protein SHIPPO 1.

Egydio de Carvalho C, Tanaka H, Iguchi N, Ventela S, Nojima H, Nishimune Y.(Author's names)

Department of Science for Laboratory Animal Experimentation, Research Institute for Microbial Diseases, Osaka University, Suita City, Osaka 565-0871, Japan.

Formation of the tail in developing sperm is a complex process involving the organization of the axoneme, transport of periaxonemal proteins from the
cytoplasm to the tail, and assembly of the outer dense fibers and fibrous sheath.Although detailed morphological descriptions of these events are available, the molecular mechanisms remain to be fully elucidated. We have isolated a new gene, named shippo 1, from a haploid germ cell-specific cDNA library of mouse testis,and also its human orthologue (h-shippo 1). The isolated cDNA is 1.2 kilobases long, carrying a 762-base pair open reading frame that encodes SHIPPO 1, a sperm protein predicted to consist of 254 amino acids. The amino acid sequence includes 6 Pro-Gly-Pro repeats, which are also present in the human orthologue protein (hSHIPPO 1) as well as in 2 other newly reported proteins of Drosophila melanogaster. Transcription of shippo 1 is exclusively observed in haploid germ cells. Antibody raised against SHIPPO 1 identified a testis-specific M(r) 32 x 10(-3) band in Western blot analysis. The protein was further localized in the flagella of the elongated spermatids and along the entire length of the tail in mature sperm. SHIPPO 1 in sperm is resistant to treatment with nonionic detergents and coextracted with the cytoskeletal core proteins of the mouse sperm tail.

Publication Types:
Research Support

ID:1187

Pls tell me how to identify senetences?

with regards
Vanitha
# 2  
Old 07-17-2008
If you have to do a lot of these, you are in trouble IMO.

Finding sentences vs scientific citations requires some sort of AI. You would have to identify a block of text ending in . that has a subject and a predicate. Either thsat or create some sort of monstrous filter that traps every single journal and author name.
It would be easier to simply edit the file by hand.
# 3  
Old 07-18-2008
Quote:
Originally Posted by jim mcnamara
If you have to do a lot of these, you are in trouble IMO.

Finding sentences vs scientific citations requires some sort of AI. You would have to identify a block of text ending in . that has a subject and a predicate. Either thsat or create some sort of monstrous filter that traps every single journal and author name.
It would be easier to simply edit the file by hand.
Hi,

Thanks for the reply.
Otherwise no other way!!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to use $variable in conditional sentences?

Hello all I am doing a Makefile but I can't return the value of $var to use it in conditional sentences: #!/bin/sh GO=$(shell) go GOPATH=$(GO) env GOPATH make: @$(GOPATH) @if ; then mkdir -p "$(GOPATH)/bin" ; fi When I type "make", @$GOPATH returns /home/icvallejo/go... (5 Replies)
Discussion started by: icvallejo
5 Replies

2. Shell Programming and Scripting

How to identify varying unique fields values from a text file in UNIX?

Hi, I have a huge unsorted text file. We wanted to identify the unique field values in a line and consider those fields as a primary key for a table in upstream system. Basically, the process or script should fetch the values from each line that are unique compared to the rest of the lines in... (13 Replies)
Discussion started by: manikandan23
13 Replies

3. Shell Programming and Scripting

Extract sentence and its details from a text file based on another file of sentences

Hi I have two text files. The first file is TEXTFILEONE.txt as given below: <Text Text_ID="10155645315851111_10155645333076543" From="460350337461111" Created="2011-03-16T17:05:37+0000" use_count="123">This is the first text</Text> <Text Text_ID="10155645315851111_10155645317023456"... (7 Replies)
Discussion started by: my_Perl
7 Replies

4. Shell Programming and Scripting

How to identify exact text and then add a blank line above it using sed?

I need to identify the exact text of San Antonio Generator Running in the output my script which lands to a text file. Once SED finds the specific text, I need it to insert one line above the matched text. Here is what I have so far that isn't working all that well for me. Any help would be... (7 Replies)
Discussion started by: jbrass
7 Replies

5. Shell Programming and Scripting

Extract all the sentences from a text file that matches a pattern list

Hi I have a big text file. I want to extract all the sentences that matches at least 70% (seventy percent) of the words from each sentence based on a word list called A. Say the format of the text file is as given below: This is the first sentence which consists of fifteen words... (4 Replies)
Discussion started by: my_Perl
4 Replies

6. Shell Programming and Scripting

Identify high values "ÿ" in a text file using Unix command

I have high values (such as ÿÿÿÿ) in a text file contained in an Unix AIX server. I need to identify all the records which are having these high values and also get the position/column number in the record structure if possible. Is there any Unix command by which this can be done to : 1.... (5 Replies)
Discussion started by: devina
5 Replies

7. Programming

How to extract a sentences of word from a text file.

Hi , i have a text file that contain a story How do i extract the out all the sentences that contain the word Mon. in C++ I only want to show those sentences that contain the word mon eg. Monkey on a tree. Rabbit jumping around the tree. I am very rich, I have lots of money. Today... (1 Reply)
Discussion started by: xiaojesus
1 Replies

8. UNIX for Dummies Questions & Answers

How to filter sentences??

Hi, I have few sentences here. $a1="Division of Hematology-Oncology, and Stem cell transplantation, Schneider Childrens Hospital, Albert Einstein College of Medicine, New Hyde Park, New York. "; $a2="Department of Cell Biology and Anatomy, College of Medicine, National Cheng Kung... (3 Replies)
Discussion started by: vanitham
3 Replies

9. Shell Programming and Scripting

comparing sentences

Hi, I have a file and that file contains the following sentences. Here we show that a virus-encoded transcription factor, viral mRNA, cellular RNA-binding protein heterodimer G3BP/Caprin-1 (p137), translation initiation factors eIF4E and eIF4G, and ribosomal proteins are concentrated in the... (4 Replies)
Discussion started by: vanitham
4 Replies

10. UNIX for Dummies Questions & Answers

spliting up sentences

hello, i'm looking to split up text into a list of words but can't figure it out, any help would be great. thanks steven (2 Replies)
Discussion started by: stevox
2 Replies
Login or Register to Ask a Question