Adding tags in between sentences with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Adding tags in between sentences with awk
# 1  
Old 02-04-2015
Adding tags in between sentences with awk

Hi,
I need an
Code:
awk

to modify the following file. It is 2-column tab-separated.

Code:
Hi PP
my VBD
name DT
is NN
. SENT

Her PP
name VBD
is DT
the NN
same WRT
. SENT

Code:
<s>
Hi PP -
my VBD -
name DT -
is NN -
. SENT .
</s>
<s>
Her PP -
name VBD -
is DT -
the NN -
same WRT -
. SENT -
</s>

I tried to use the following awk
Code:
awk '{print $1 "\t" $2 "\t" "-"}'

but I can not figure out how to include the
Code:
<s>

and
Code:
</s>

in between each sentence.

Any suggestions?

Last edited by owwow14; 02-05-2015 at 05:19 AM.. Reason: updated code snippet
# 2  
Old 02-04-2015
Your code snippet doesn't fit what we see in the desired output:
- where's the <TAB>s?
- why no hyphens in the second paragraph?
- IF there's a dot after the first SENT, where's it after the second?

So we don't have a chance to infer the task from your info given.

Please give us a precise specification of what you want to get done.
# 3  
Old 02-05-2015
Quote:
Originally Posted by owwow14
Hi,
I need an
Code:
awk

to modify the following file. It is 2-column tab-separated.

Code:
Hi PP
my VBD
name DT
is NN
. SENT
 
Her PP
name VBD
is DT
the NN
same WRT
. SENT

Code:
<s>
Hi PP -
my VBD -
name DT -
is NN -
. SENT .
</s>
<s>
Her PP
name VBD
is DT
the NN
same WRT
. SENT
</s>

I tried to use the following awk
Code:
awk '{print $1 "\t" $2 "\t" "-"}'

but I can not figure out how to include the
Code:
<s>

and
Code:
</s>

in between each sentence.
Any suggestions?
Hello owwow14,

Could you please try following and let me know if this helps you.
Code:
awk -vs="<s>" -vs1="</s>" 'function add_tags(A){if(A==1){$0=s ORS $0};if(A==2){$0=$0 ORS s1}}($0 ~ /^$/){next} (NR==1 || j==1){add_tags(1);j=0} ($0==". SENT"){add_tags(2);j=1} 1'  Input_file

Output will be as follows.
Code:
<s>
Hi PP
my VBD
name DT
is NN
. SENT
</s>
<s>
Her PP
name VBD
is DT
the NN
same WRT
. SENT
</s>

Also a non oneliner form of the solution is as follows.
Code:
awk -vs="<s>" -vs1="</s>" 'function add_tags(A){
                                                if(A==1)        {
                                                                        $0=s ORS $0};
                                                                                        if(A==2){
                                                                                                        $0=$0 ORS s1
                                                                                                }
                                                                }
                                                ($0 ~ /^$/)     {
                                                                        next
                                                                }
                                                (NR==1 || j==1) {
                                                                        add_tags(1);
                                                                        j=0
                                                                }
                                                ($0==". SENT")  {
                                                                        add_tags(2);
                                                                        j=1
                                                                }
                           1
                          ' Input_file

Thanks,
R. Singh
# 4  
Old 02-05-2015
Like RudiC says, there are inconsistencies in your specification.

To produce output like in the first half of your sample input/output, try:
Code:
awk '{$1=$1; print "<s>\n" $0 "\t.\n</s>"}' RS=  FS='\n' OFS='\t-\n' file

If it is like the lower half, try:
Code:
awk '{$1=$1; print "<s>\n" $0 "\n</s>"}' RS=  FS='\n' OFS='\n' file

# 5  
Old 02-05-2015
Hi,
I updated the code snippet so that I hope the desired output is clearer.
@RavinderSingh13 your code gives me the following error:
Code:
awk: invalid -v option

@RudiC and @Scrutinizer I hope that the updated desired output answers some of your questions.
@Scrutinizer I tried your code too but the output that it gives me is not correct. Here is the example. As you see - there are "-" in the blank spaces and the
Code:
<s>

and
Code:
</s>

envelope the entire text rather than each individual sentence.

Code:
<s>
Hi PP	-
my VBD	-
name DT	-
is NN	-
. SENT	-
 	-
Her PP	-
name VBD	-
is DT	-
the NN	-
same WRT	-
. SENT	.
</s>

Again, here would be the example of the desired output:

Code:
<s>
Hi PP	-
my VBD	-
name DT	-
is NN	-
. SENT	-
</s>
<s>
Her PP	-
name VBD	-
is DT	-
the NN	-
same WRT	-
. SENT	.
</s>

# 6  
Old 02-05-2015
Hi, that shows that the empty lines in the input files contain some characters. Try this instead:
Code:
awk '!NF{$0=x}1' file |  awk '{$1=$1; print "<s>\n" $0 "\t.\n</s>"}' RS=  FS='\n' OFS='\t-\n'

There is still some ambiguity. In the first half there is a trailing dot, in the second half there is a trailing dash.
Also, your samples appears to not be TAB-delimited, contrary to what you say in the description..

Last edited by Scrutinizer; 02-05-2015 at 07:42 AM..
# 7  
Old 02-05-2015
Looks like the input file sample has DOS <CR> char line terminators...
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to use $variable in conditional sentences?

Hello all I am doing a Makefile but I can't return the value of $var to use it in conditional sentences: #!/bin/sh GO=$(shell) go GOPATH=$(GO) env GOPATH make: @$(GOPATH) @if ; then mkdir -p "$(GOPATH)/bin" ; fi When I type "make", @$GOPATH returns /home/icvallejo/go... (5 Replies)
Discussion started by: icvallejo
5 Replies

2. Shell Programming and Scripting

Using awk to find sentences.

I am trying to print out sentences that meets a regular expression in awk (I’m open to using other tools, too). I got the regular expression I want to use, "(\+ \{4\})" from user ripat in a grep forum. Unfortunately with grep I couldn't print only the sentence. While searching for awk... (8 Replies)
Discussion started by: danbroz
8 Replies

3. Shell Programming and Scripting

extracting sentences that only contain a word

Hi guys Need your help how do I extract sentences with only a word i.e. today is hot hot very humid humid2 Sample output hot (6 Replies)
Discussion started by: jamestan
6 Replies

4. UNIX for Dummies Questions & Answers

extracting sentences that only contain a word

Hi guys Need your help how do I extract sentences with only a word i.e. today is hot hot very humid humid2 Sample output hot very (0 Replies)
Discussion started by: jamestan
0 Replies

5. UNIX for Dummies Questions & Answers

How to filter sentences??

Hi, I have few sentences here. $a1="Division of Hematology-Oncology, and Stem cell transplantation, Schneider Childrens Hospital, Albert Einstein College of Medicine, New Hyde Park, New York. "; $a2="Department of Cell Biology and Anatomy, College of Medicine, National Cheng Kung... (3 Replies)
Discussion started by: vanitham
3 Replies

6. Shell Programming and Scripting

comparing sentences

Hi, I have a file and that file contains the following sentences. Here we show that a virus-encoded transcription factor, viral mRNA, cellular RNA-binding protein heterodimer G3BP/Caprin-1 (p137), translation initiation factors eIF4E and eIF4G, and ribosomal proteins are concentrated in the... (4 Replies)
Discussion started by: vanitham
4 Replies

7. Shell Programming and Scripting

How to get exact match sentences?

Hi, I have sentences like this: $sent= Protein modeling studies reveal that the RG-rich region is part of a three to four strand antiparallel beta-sheet, which in other RNA binding protein functions as a platform for nucleic acid interactions. Heterogeneous nuclear ribonucleoparticle... (19 Replies)
Discussion started by: vanitham
19 Replies

8. Shell Programming and Scripting

How to identify sentences from a text?

Hi, I have to identify sentences from this text. If i split these statements by this way: @sentence= split(/\.\W*/,$text); I will get these following things also in the output along with proper sentences. Biol Reprod. 2002 Mar;66(3):785-95. Egydio de Carvalho C, Tanaka H,... (2 Replies)
Discussion started by: vanitham
2 Replies

9. UNIX for Dummies Questions & Answers

spliting up sentences

hello, i'm looking to split up text into a list of words but can't figure it out, any help would be great. thanks steven (2 Replies)
Discussion started by: stevox
2 Replies
Login or Register to Ask a Question