I have a file and that file contains the following sentences.
Code:
Here we show that a virus-encoded transcription factor, viral mRNA, cellular RNA-binding protein heterodimer G3BP/Caprin-1 (p137), translation initiation factors eIF4E and eIF4G, and ribosomal proteins are concentrated in the same subdomains of cytoplasmic DNA factories
The single-stranded DNA- and RNA-binding protein, Puralpha, has been implicated in many biological processes, including control of transcription of multiple genes, initiation of DNA replication, and RNA transport and translation
RNA-binding proteins are involved in processes such as protection of RNAs from RNase degradation, prevention of ribosome binding to mRNA, control of formation of secondary structures of the mRNA that permit or prevent translation initiation, and termination/antitermination of transcription in response to external signals
d)The La autoantigen is an RNA-binding protein that is involved in initiation and termination of RNA polymerase III transcription
Here is the code to match the sentence.
Code:
//Suppose if this is my input string
$str="Here we show that a virus-encoded transcription factor, viral mRNA, cellular RNA-binding protein heterodimer G3BP/Caprin-1 (p137), translation initiation factors eIF4E and eIF4G, and ribosomal proteins are concentrated in the same subdomains of cytoplasmic DNA factories";
open(FH,"sample.txt");
while(<FH>)
{
if($_=~/$str/)
{
print "matched\n";
}
else
{
print "not match\n";
}
}
close FH;
If the above string is matching it should print matched but even though that sentences is there still it is not matcing.
Hi,
I have to identify sentences from this text.
If i split these statements by this way:
@sentence= split(/\.\W*/,$text);
I will get these following things also in the output along with proper sentences.
Biol Reprod.
2002 Mar;66(3):785-95.
Egydio de Carvalho C, Tanaka H,... (2 Replies)
Hi,
I have sentences like this:
$sent=
Protein modeling studies reveal that the RG-rich region is part of a three to four strand antiparallel beta-sheet, which in other RNA binding protein functions as a platform for nucleic acid interactions.
Heterogeneous nuclear ribonucleoparticle... (19 Replies)
I havet two books, one in Swedish, the other in English.
Two text files. I want to combine them into one, with each sentence having it's translation next to it.
------------
Text file one.
Example sentence in English. Example 2 sentence 2 in English 2.
--------------
Text file two.
... (2 Replies)
Hi,
I have few sentences here.
$a1="Division of Hematology-Oncology, and Stem cell transplantation, Schneider Childrens Hospital, Albert Einstein College of Medicine, New Hyde Park, New York. ";
$a2="Department of Cell Biology and Anatomy, College of Medicine, National Cheng Kung... (3 Replies)
hi,
I want to compile a program in C. It have a multiple calls to teradata. I have no idea how to compile in Aix.
The compiler that I should be use is cc.
I tried
cc -G -KPIC tdsfbd0358.c this generates a tdsfbd0358.i and after I have no Idea what I have to make, a link? how?... (3 Replies)
I am trying to print out sentences that meets a regular expression in awk (I’m open to using other tools, too).
I got the regular expression I want to use, "(\+ \{4\})" from user ripat in a grep forum. Unfortunately with grep I couldn't print only the sentence.
While searching for awk... (8 Replies)
Hello all
I am doing a Makefile but I can't return the value of $var to use it in conditional sentences:
#!/bin/sh
GO=$(shell) go
GOPATH=$(GO) env GOPATH
make:
@$(GOPATH)
@if ; then mkdir -p "$(GOPATH)/bin" ; fi
When I type "make", @$GOPATH returns /home/icvallejo/go... (5 Replies)
Discussion started by: icvallejo
5 Replies
LEARN ABOUT CENTOS
voikkogc
VOIKKOGC(1) General Commands Manual VOIKKOGC(1)NAME
voikkogc - test program for Voikko grammar checker
SYNOPSIS
voikkogc [options]
DESCRIPTION
voikkogc is a test program for grammar checking functionality in libvoikko, library of Finnish language tools. It reads sentences or para-
graphs from stdin (one per line) and print the results to stdout. The results are structures containing information about grammar errors
found in the input paragraph.
OPTIONS --tokenize
Instead of looking for grammar errors, split input into tokens. The tokens are prefixed by type: "W" is a word, "P" is punctuation,
"S" is whitespace, "U" is unknown and "E" is a prefix for error messages.
--split-sentences
Instead of looking for grammar errors, split input into sentences. The sentences are prefixed by type: "B" means that end of sen-
tence is a probably correct, "P" means that end of sentence is a possibly correct (but probably this and the next identified sen-
tence should be joined) and "E" means that sentence ends at the end of input.
-n Prefix all grammar checker messages with line number of input data.
accept_titles=n
accept_unfinished_paragraphs=n
accept_bulleted_lists=n
Set the value of the specified boolean option.
explanation_language=langcode
Print human readable error explanation in the specified language.
BUGS
Human readable error explanations are printed in UTF-8 regardless of current locale settings.
SEE ALSO
voikkospell for common options of different Voikko test tools.
AUTHOR
voikkogc and this manual page were written by Harri Pitkanen (hatapitk@iki.fi).
2010-05-06 VOIKKOGC(1)