Contextual search and replace in a tagged file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Contextual search and replace in a tagged file
# 1  
Old 11-04-2014
Contextual search and replace in a tagged file

Dear all,
I have a large tagged training file in Hindi for Parts of Speech. When I tagged the file, I inadvertently classified Pronouns and Adjectives as one single category. This has resulted in ambiguity.
An example from English will make this clear.
Code:
This is his.
This is his book.

The tagged file gives the following output

Code:
This_DT is_VBZ his_PRP
This_DT is_VBZ his_PRP book_NN

Code:
his

is tagged in both cases as
Code:
 _PRP

Since the data is voluminous (800,000 tags), I would like to make a conditional contextual search. Luckily in Hindi a
Code:
 _PRP

followed by a tag
Code:
_NN

will always be an adjective, if not followed it will be a pronoun.
What I am looking for is an awk or perl script which can do the job. This would mean the following steps.
Code:
1. Finding the tag _PRP
2. Looking for the next Tag
3. If the next Tag is _NN, replace _NN by _ADJP, otherwise do nothing.

The structure of the tagged data is as under:
Code:
WORD followed by _ followed by TAG<SPACE>followed byWORD followed by _ followed by TAG

I am giving below a sample set in Hindi for testing
Code:
1a. उसकी_PRP किताब_NN 
1b. यह_DMD उसकी_PRP है_VM 
2a.रामने_NN मेरी_PRP किताब_NN ली_VM 
2b.रामने_NN मेरी_PRP ली_VM

Since in 1a. and 2a.
Code:
_PRP

is followed by a
Code:
_NN

, it should be replaced by a
Code:
_ADJP

tag.
Since in 1b. and 2b. the condition does not exist, it should be mantained as such.
I have never attempted a contextual search of this type in AWK or PERL and all my attempts have resulted in disaster. Please help out, since the data is voluminous and cannot be retagged manually.
I work in a Windows environment.
# 2  
Old 11-04-2014
Would this
Code:
awk '{for (i=1; i<NF; i++) if ($i ~ /_PRP$/ && $(i+1) ~ /_NN$/) sub (/PRP$/,"ADJP", $i)} 1' file
1a. उसकी_ADJP किताब_NN
1b. यह_DMD उसकी_PRP है_VM 
2a.रामने_NN मेरी_ADJP किताब_NN ली_VM
2b.रामने_NN मेरी_PRP ली_VM

do what you need?
This User Gave Thanks to RudiC For This Post:
# 3  
Old 11-04-2014
Many thanks. It really meets my needs. I have studied the script and can see how I can do a contextual Search and Replace using AWK.







i
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search and replace in file

hi All i'm new to shell/bash scripting and need help to write a script. question: i have a file of 100's of line, i need to replace all zeros in that file with its respective position, starting from 0 to 23 and remove the remaining sample file is like. Enter the date in the format... (12 Replies)
Discussion started by: sunnu2u86
12 Replies

2. Shell Programming and Scripting

Search and Replace in a new FILE.

Hi, more data.txt more srstring.sh input="data.txt" while IFS= read -r var do startdirectory=$loc search=$(echo $var | awk -F'=' '{print $1}') replace=$(echo $var | awk -F'=' '{print $2}') find "/tmp/config" -type f -exec grep -l "$search" {} + | while read file do if sed -e... (9 Replies)
Discussion started by: mohtashims
9 Replies

3. Shell Programming and Scripting

Nested search in a file and replace the inner search

Hi Team, I am new to unix, please help me in this. I have a file named properties. The content of the file is : ##Mobile props east.url=https://qa.east.corp.com/prop/end west.url=https://qa.west.corp.com/prop/end south.url=https://qa.south.corp.com/prop/end... (2 Replies)
Discussion started by: tolearn
2 Replies

4. Shell Programming and Scripting

Search for a string in a file and replace

I have file t1.log Contents of t1.log file Number of records processed: Number of records rejected: Error : xyz .......... abc .......... aaa _] start time : end time : Please let me know how i can remove the contents highlighted in red in the t1.log file. Thanks Sam (3 Replies)
Discussion started by: sam777
3 Replies

5. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

6. Shell Programming and Scripting

Help on search and replace in a file

Hi all, The operating system is Solaris 10 I have example line here below I need to change the stat1 to stat2 using a shell script. search for space(" ") and replace with "\ " Stat1 --- /data/Sat Night Stay.txt Stat2 --- /data/Sat\ Night\ Stay.txt Thanks Firestar. (1 Reply)
Discussion started by: firestar
1 Replies

7. UNIX for Advanced & Expert Users

search and replace in a file

I have a file (say file1.txt) and I have to search for a line which has a text replace it and replace another string too in the same line. Eg: file1.txt -------- x='hai' y='world' z='unix' x='hai' y='world' x='hai' z='perl' y='world' z="world" k="junk" b="world" z='perl' x='hai'... (3 Replies)
Discussion started by: ammu
3 Replies

8. Shell Programming and Scripting

search and replace in a file

Hi I have to search & replace column in the file.For example ..below iam having File1. in which 3rd column ...if it is A it should be 'ACT' if P it should be 'PAD' and if it ils D it should be 'DEC' I have to pass column no ,value and to be converted value as variables in to the... (2 Replies)
Discussion started by: satyam_sat
2 Replies

9. Shell Programming and Scripting

Search and replace in file..

Hi All, As I'm working on a Unix script... and the requirement is like, I need to search a word and replace it with the another word... for that i'm using SED command.... can anybody give any other alternate for this...? Thanks Amit (2 Replies)
Discussion started by: Amits
2 Replies

10. UNIX for Dummies Questions & Answers

Search and replace in file

Hi guys, I have one file with duplicate string. I want to replace all the occurance of that string with some other string. How can I do that in vi editor? Malay Maru (3 Replies)
Discussion started by: malaymaru
3 Replies
Login or Register to Ask a Question