How to get exact match sentences?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to get exact match sentences?
# 1  
Old 08-16-2008
How to get exact match sentences?

Hi,

I have sentences like this:

Code:
$sent=

Protein modeling studies reveal that the RG-rich region is part of a three to four strand antiparallel beta-sheet, which in other RNA binding protein functions as a platform for nucleic acid interactions.

Heterogeneous nuclear ribonucleoparticle (hnRNP) proteins form a family of RNA binding proteins (RBPs) that coat nascent pre-mRNAs.

Finally, we have found that Pumilio2, a member of the PUF family of RNA-binding proteins, is highly concentrated at the vertebrate neuromuscular junction.

PUF proteins comprise a highly conserved family of sequence-specific RNA-binding protein that regulate target mRNAs.

User enters a query term like this "RNA binding protein" and i am taking like this.

Code:
$word=param('query');
print "\n$word\n";

What i want to do is it should pick up the sentences which has RNA-binding protein also!!

How to write a regular expression such that $word has to pick up these sentences which has "RNA-binding protein" and RNA binding protein?

With regards
Archana
# 2  
Old 08-16-2008
the regular expression you wish could be like:
Code:
/RNA(?:-| )binding protein/

I don't know exactly what language you want this but you could also do something like
Code:
/RNA[-\s]binding protein/


Last edited by redoubtable; 08-16-2008 at 06:55 AM.. Reason: adding second regex
# 3  
Old 08-16-2008
In the advanced course we will bump into sentences like "RNA-bound protein". In the Nobel Laureate course we will handle text in German and Chinese as well.

Seriously, you could try to generalize your search patterns somewhat (specify all possible verb tenses, etc) but the general problem of language parsing has not been solved completely yet.

Search engines map down each word token to a normalized form so you can find "found" in Google when searching for "find". In some contexts, this is a misfeature -- when you know exactly what you want, you don't want the "sugary" matches at all.

In the meantime, maybe it'd be enough to replace all spaces with dots in your regular expressions for the time being ...
# 4  
Old 08-18-2008
Quote:
Originally Posted by redoubtable
the regular expression you wish could be like:
Code:
/RNA(?:-| )binding protein/

I don't know exactly what language you want this but you could also do something like
Code:
/RNA[-\s]binding protein/


Hi,

Thanks for the reply!!
I got this expression but i don't know how to check this expression using $word?

With regards
Vanitha
# 5  
Old 08-18-2008
Assuming it is Perl we are talking about here:

Code:
if ($word ~ /RNA[-\s]binding protein/) {
   print "we have a match: $word";
}

# 6  
Old 08-18-2008
Quote:
Originally Posted by era
Assuming it is Perl we are talking about here:

Code:
if ($word ~ /RNA[-\s]binding protein/) {
   print "we have a match: $word";
}

Hi,

This is one example for word but if user enters something like this it has to match and retrieve and i am not getting how to write an expression for $word to retreive match sentences?

Another eg:Transcription-factor,Transcription factor like that many words will be like that!!!

I n a generalized way how to match the words like this?

With regards
Vanitha
# 7  
Old 08-18-2008
Quote:
Originally Posted by vanitham
Hi,

This is one example for word but if user enters something like this it has to match and retrieve and i am not getting how to write an expression for $word to retreive match sentences?

Another eg:Transcription-factor,Transcription factor like that many words will be like that!!!

I n a generalized way how to match the words like this?

With regards
Vanitha
to retrieve the sentence you want which matches a certain pattern you do like so:
Code:
if ($word =~ /(.*?RNA[-\s]binding protein.*?)$/) { 
      print "$1\n"; 
}

If you have multiple patterns you either put them all on a list and check one by one or create an expression that allows spaces or '-' between words (but that could be faulty and you would lose track of things)
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Grep exact match

Hello! I have 2 files named tacs.tmp and tacDB.txt tacs.tmp looks like this 0 10235647 102700 106800 107200 1105700 tacDB.txt looks like this 100100,Mitsubishi,G410,Handheld,,0,0,0 100200,Siemens,A53,Handheld,,0,0,0 100300,Sony Ericsson,TBD (AAB-1880030-BV),Handheld,,0,0,0... (2 Replies)
Discussion started by: Cludgie
2 Replies

3. Shell Programming and Scripting

Get the exact match of the string!

Hi All, I am breaking my head in trying to get a command that will exactly match my given string. I have searched net and found few of the options - grep -F $string file grep -x $string file grep "^${string}$" file awk '/"${string}"/ {print $0}' file strangely nothing seems to... (3 Replies)
Discussion started by: dips_ag
3 Replies

4. UNIX for Dummies Questions & Answers

Interpolation if there is no exact match for value

Dear all, could you help me with following question. There are two datasets (below). I need to find match between BP values from data1 and data2, and add corresponding CM value from data2 into data1. if there is not exact match, the corresponding CM value should be calculated using interpolation.... (20 Replies)
Discussion started by: kush
20 Replies

5. Shell Programming and Scripting

Exact match and #

Hi friends, i am using the following grep command for exact word match: >echo "sachin#tendulkar" | grep -iw "sachin" output: sachin#tendulkar as we can see in the above example that its throwinng the exact match(which is not the case as the keyword is sachin and string is... (6 Replies)
Discussion started by: neelmani
6 Replies

6. Shell Programming and Scripting

Exact match question

Hi, I have a file like follows . . . White.Jack.is.going.home Black.Jack.is.going.home Red.Jack.is.going.home Jack.is.going.home . . . when I make: cat <file> | grep -w "Jack.is.going.home" it gives: White.Jack.is.going.home Black.Jack.is.going.home Red.Jack.is.going.home... (4 Replies)
Discussion started by: salih81
4 Replies

7. Shell Programming and Scripting

exact string match ; search and print match

I am trying to match a pattern exactly in a shell script. I have tried two methods awk '/\<mpath${CURR_MP}\>/{print $1 $2}' multipath perl -ne '/\bmpath${CURR_MP}\b/ and print' /var/tmp/multipath Both these methods require that I use the escape character. I am guessing that is why... (8 Replies)
Discussion started by: bash_in_my_head
8 Replies

8. Shell Programming and Scripting

exact match in Perl

Hi By using select clause I'm trying to pull out the rows to a variable. If the variable has 0 row(s) selected then i'm printing some text message else printing some other text message if($xyz =~ m/0 row/) { print "0 rows "; } else { print " There are rows"; } By my problem... (4 Replies)
Discussion started by: pdreddy34
4 Replies

9. Shell Programming and Scripting

How to match all array contents and display all highest matched sentences in perl?

Hi, I have an array with 3 words in it and i have to match all the array contents and display the exact matched sentence i.e all 3 words should match with the sentence. Here are sentences. $arr1="Our data suggests that epithelial shape and growth control are unequally affected depending... (5 Replies)
Discussion started by: vanitham
5 Replies

10. UNIX for Advanced & Expert Users

Exact Match thru grep ?????

hey..... i do have text where the contents are like as follows, FILE_TYPE_NUM_01=FILE_TYPE=01|FILE_DESC=Periodic|FILE_SCHDL_TYPE=Daily|FILE_SCHDL=|FILE_SCHDL_TIME=9:00am|RESULTS=B FILE_TYPE_NUM_02=FILE_TYPE=02|FILE_DESC=NCTO|FILE_SCHDL_TYPE=Daily|FILE_SCHDL=|FILE_SCHDL_TIME=9:00am|RESULTS=M... (2 Replies)
Discussion started by: manas_ranjan
2 Replies
Login or Register to Ask a Question