Using awk to find sentences.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using awk to find sentences.
# 1  
Old 11-19-2012
Using awk to find sentences.

I am trying to print out sentences that meets a regular expression in awk (I’m open to using other tools, too).

I got the regular expression I want to use,
Code:
"([^)(]\+ [0-9]\{4\})"

from user ripat in a grep forum. Unfortunately with grep I couldn't print only the sentence.

While searching for awk solutions as suggested I came across this: filtering - How to print out a specific field in AWK? - Stack Overflow
which states “
Code:
awk '/word1/'

will print out the whole sentence" I also looked else-were including the man page for direct sentence retrieval with no luck.

I tried
Code:
$ awk '/([^)(]\+ [0-9]\{4\})/' BioPsych10.txt

and
Code:
$ awk '/"([^)(]\+ [0-9]\{4\})"/' BioPsych10.txt

but they returned nothing.

The text I am searching through is at http://dl.dropbox.com/u/4235339/BioPsych10.txt

Thank you,
DanBroz

Last edited by danbroz; 11-19-2012 at 09:18 PM.. Reason: updated http://dl.dropbox.com/u/4235339/BioPsych10.txt
# 2  
Old 11-19-2012
I think you are missing the printing part in awk Smilie

Why don't you try like this??
Code:
 awk '/pattern/ {print $0}' file

# 3  
Old 11-19-2012
I can not download the file to download.
Do you like to find the exact match of the string "([^)(]\+ [0-9]\{4\})", or is this a search pattern?

Example string:
Code:
sfgs grg wrrefg wreg wre "([^)(]\+ [0-9]\{4\})"rwe trt wre 
wretrtwretwret  ret rt wretwret wret 
wt wret wretrt wret wret  rt wretw rett254t 5 tt

and you like the hits of line #1?
# 4  
Old 11-19-2012
It is a search pattern.
trying downloading the file here:
http://dl.dropbox.com/u/4235339/BioPsych10.txt
# 5  
Old 11-19-2012
The regex: "([^)(]\+ [0-9]\{4\})" is not extended regex but GNU basic regex...
This is the same with POSIX extended regex:
Code:
grep -Eo '\([^)]+ [0-9]{4}\)' infile

if your grep supports the -o option then it will return the occurrences on a single line. Without the -o option (just grep -E) it will return the results on a single line plus the line the pattern was found on...

Likewise awk:
Code:
awk '/\([^)]+ [0-9]{4}\)/' infile

Would return the pattern plus the line it was found on:

Try something like this for multi-line results:
Code:
awk 'NR>1 && $1~/^[^)]+ [0-9]{4}$/{print RS $1 FS}' RS=\( FS=\) infile

GNU awk prior to 4.0 does not support repetition ({4}) by default. Use (g)awk --posix

Last edited by Scrutinizer; 11-20-2012 at 01:47 AM.. Reason: Mention gawk version as suggested by Alister..
# 6  
Old 11-19-2012
Quote:
Try something like this for multi-line results:
Code:
awk 'NR>1 && $1~/^[^)]+ [0-9]{4}$/{print RS $1 FS}' RS=\( FS=\) infile

GNU awk does not support repetition ({4}) by default. Use (g)awk --posix
This seems to not return the complete sentence.

It would work if I managed the carriage returns of the file by first deleting them then adding new lines
after every “).” The very talented user ripat told the first part by
Code:
 tr -d '\n' < file |

tl/dr need code to add a newline after every “).”

Last edited by danbroz; 11-19-2012 at 06:16 PM.. Reason: messed up [\code]
# 7  
Old 11-19-2012
Please post relevant text on this site, not external sites. The external site will age out the text post, then some future searcher will not have a clue as what this thread is really about.

In fact it went to lala land (404) just now..... Nobody can effectively help you now.

Thank you.

Last edited by jim mcnamara; 11-19-2012 at 07:32 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to use $variable in conditional sentences?

Hello all I am doing a Makefile but I can't return the value of $var to use it in conditional sentences: #!/bin/sh GO=$(shell) go GOPATH=$(GO) env GOPATH make: @$(GOPATH) @if ; then mkdir -p "$(GOPATH)/bin" ; fi When I type "make", @$GOPATH returns /home/icvallejo/go... (5 Replies)
Discussion started by: icvallejo
5 Replies

2. Shell Programming and Scripting

Adding tags in between sentences with awk

Hi, I need an awk to modify the following file. It is 2-column tab-separated. Hi PP my VBD name DT is NN . SENT Her PP name VBD is DT the NN same WRT . SENT <s> Hi PP - (6 Replies)
Discussion started by: owwow14
6 Replies

3. Shell Programming and Scripting

extracting sentences that only contain a word

Hi guys Need your help how do I extract sentences with only a word i.e. today is hot hot very humid humid2 Sample output hot (6 Replies)
Discussion started by: jamestan
6 Replies

4. UNIX for Dummies Questions & Answers

extracting sentences that only contain a word

Hi guys Need your help how do I extract sentences with only a word i.e. today is hot hot very humid humid2 Sample output hot very (0 Replies)
Discussion started by: jamestan
0 Replies

5. UNIX for Dummies Questions & Answers

How to filter sentences??

Hi, I have few sentences here. $a1="Division of Hematology-Oncology, and Stem cell transplantation, Schneider Childrens Hospital, Albert Einstein College of Medicine, New Hyde Park, New York. "; $a2="Department of Cell Biology and Anatomy, College of Medicine, National Cheng Kung... (3 Replies)
Discussion started by: vanitham
3 Replies

6. Shell Programming and Scripting

comparing sentences

Hi, I have a file and that file contains the following sentences. Here we show that a virus-encoded transcription factor, viral mRNA, cellular RNA-binding protein heterodimer G3BP/Caprin-1 (p137), translation initiation factors eIF4E and eIF4G, and ribosomal proteins are concentrated in the... (4 Replies)
Discussion started by: vanitham
4 Replies

7. Shell Programming and Scripting

How to get exact match sentences?

Hi, I have sentences like this: $sent= Protein modeling studies reveal that the RG-rich region is part of a three to four strand antiparallel beta-sheet, which in other RNA binding protein functions as a platform for nucleic acid interactions. Heterogeneous nuclear ribonucleoparticle... (19 Replies)
Discussion started by: vanitham
19 Replies

8. Shell Programming and Scripting

How to identify sentences from a text?

Hi, I have to identify sentences from this text. If i split these statements by this way: @sentence= split(/\.\W*/,$text); I will get these following things also in the output along with proper sentences. Biol Reprod. 2002 Mar;66(3):785-95. Egydio de Carvalho C, Tanaka H,... (2 Replies)
Discussion started by: vanitham
2 Replies

9. Shell Programming and Scripting

Anyways to find sentences with data format and extract it???

Hi guys,i got this problem which is..i need to find those sentences with date inside and extract them out,the input is somehow like this eg: $DATA42.GANTRY2.GA161147 DISKFILE 2007-10-16 11:56:45 SUPER.OPR \NETS.$Y4CB.#IN ... (4 Replies)
Discussion started by: cyberray
4 Replies

10. UNIX for Dummies Questions & Answers

spliting up sentences

hello, i'm looking to split up text into a list of words but can't figure it out, any help would be great. thanks steven (2 Replies)
Discussion started by: stevox
2 Replies
Login or Register to Ask a Question