Stemming of words that contained affixes by using shell script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Stemming of words that contained affixes by using shell script
# 1  
Old 05-02-2016
Stemming of words that contained affixes by using shell script

I just learning shell script. Need your shell script expertise to help me. I would like to stemming the words by matching the root words first between both files and replace all words by "I" character but replace "B" character after root words and "E" before root words in affix_words.txt.

root_words.txt:

Code:
read
like
.....

affix_words.txt:

Code:
reading
unlikely
.....

The expected output is:

Code:
r e a d i n g<TAB>I I I I B I I
u n l i k e l y<TAB>I E I I I I B I
.....


Last edited by paranrat; 05-02-2016 at 07:14 AM.. Reason: Changed icode to code tags.
# 2  
Old 05-02-2016
Not quite clear. You want to print every input word in the affix file plus, separated by a <TAB>, this word with every char replaced by upper case I except for the last char BEFORE a match with the root file replaced by E , and the first char AFTER a match replaced by B , and a space after every single character?
# 3  
Old 05-02-2016
Yes, that's right Rudic Smilie

---------- Post updated at 05:14 AM ---------- Previous update was at 05:03 AM ----------

Yes, that right RudiC
# 4  
Old 05-02-2016
Try
Code:
awk '
NR==FNR         {SP=SP DL $1
                 DL = "|"
                 next
                }
match ($0, SP)  {T = $0
                 gsub (/./, "I", T)
                 T = substr (T, 1, RSTART-2) (RSTART>1?"E":"") substr (T, RSTART, RLENGTH) (RSTART+RLENGTH<length?"B":"") substr (T, RSTART+RLENGTH+1)
                 T = $0 "\t" T
                 $0 = ""
                 for (i=1; i<=length(T); i++) $0 = $0 substr (T, i, 1) " "
                }
1
' root_words.txt affix_words.txt
r e a d i n g    I I I I B I I
u n l i k e l y          I E I I I I B I

This User Gave Thanks to RudiC For This Post:
# 5  
Old 05-02-2016
Thank you so much Rudic Smilie and sorry for disturbing you. Could you explain a little bit about the code above?

Last edited by paranrat; 05-02-2016 at 07:26 AM..
# 6  
Old 05-02-2016
No need to apologize. Welcome.
Code:
awk '
NR==FNR         {SP=SP DL $1                    # collect root words into a search pattern built from "alternate expressions".
                 DL = "|"                       # make the delimiter the infix alternate operator
                 next
                }
match ($0, SP)  {T = $0                         # if any root word found in affix, create a working variable T from input line
                 gsub (/./, "I", T)             # make temp consist of all "I"s

                 T = substr (T, 1, RSTART-2) (RSTART>1?"E":"") substr (T, RSTART, RLENGTH) (RSTART+RLENGTH<length?"B":"") substr (T, RSTART+RLENGTH+1)
                                                # replace the "I" before SP with "E", and after SP with "B"

                 T = $0 "\t" T                  # combine input line with result pattern
                 $0 = ""                        
                 for (i=1; i<=length(T); i++) $0 = $0 substr (T, i, 1) " "
                                                # intersperse spaces
                }
1                                               # default action: print $0
' file1 file2

These 2 Users Gave Thanks to RudiC For This Post:
# 7  
Old 05-03-2016
How to make output like this:

Code:
r<TAB>I
e     I
a     I
d     I
i     B
n     I
g     I
$     $   #put sign "$" at last of word

.     .
.     .
.     .

I try to change a bit your code:
Code:
for (i=1; i<=length(T); i++) $0 = $0 substr (T, i, 1) "\n"

but the output become like this:

Code:
r
e
a
d
i
n
g

I
I
I
I
B
I
I

.
.
.

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Python Script for keyword and Stemming

Hello All, I have python script that pulls out a keyword from the data set. The data set contains 3 columns, 1. SysID 2. ID 3. Comment Section. This script just pulls out keyword for certain extent from Comment section and display only keyword, not any other columns. Can someone help... (1 Reply)
Discussion started by: jg355187
1 Replies

2. UNIX for Advanced & Expert Users

Shell script to convert words to Title case

Hi :) I have a .txt file with thousands of words. I was wondering if i could use a simple sed or awk command to convert / replace all words in the text file to Title Case format ? Example: from: this is line one this is line two this is line three to desired output: This Is Line... (8 Replies)
Discussion started by: martinsmith
8 Replies

3. Shell Programming and Scripting

Shell script to read words into an array

Hello, I have output like below: ----------------------------------------------------------------------------- Group 'group1' on system 'system01' is running. ----------------------------------------------------------------------------- Group 'group2' on system 'system01' is running.... (4 Replies)
Discussion started by: sniper57
4 Replies

4. Shell Programming and Scripting

shell prog for double words

I need a shell programing script for "double words" Available Data: This is a shell script that is used to find the ten character words in the machine local dictionary (/usr/dict/words) that are made up with two valid english words that are five character long. This means that each of the... (0 Replies)
Discussion started by: sujithcrazy
0 Replies

5. Shell Programming and Scripting

shell script to print words having first and last character same.

Hi I want to write a shell script to print only those words from a file whose beginning and last character are same. Please help. Thanks, vini (5 Replies)
Discussion started by: vini kumar
5 Replies

6. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

7. Shell Programming and Scripting

UNIX SHELL taking in 2 words

echo apple orange What command would I use to grab the common letters out of both words? output would be a and e are both common in these 2 words! Please no ruby explanations and please no full answers I just need to know how to get started! Could I sed out a and if the variable =aa... (3 Replies)
Discussion started by: shellmybell
3 Replies

8. Shell Programming and Scripting

how to shift few words of filenames at a time using shell script

Hello everybody, I have some files in directory. I want to shift 3 characters of filenames to the right at a same time. for example, I have filenames like $ls -l 01_2000.G3.input.txt 02_2000.G3.input.txt ..., ..., 04_2010.G3.input.txt I want to change the filenames like... (3 Replies)
Discussion started by: yogeshkumkar
3 Replies

9. Shell Programming and Scripting

how to retrieve the word between 2 specific words with shell script

Hi, I have a file content like: : : <span class="ColorRed"> 1.23</span><br> : : the value 1.23 will be changed from time to time, and I want to use a shell script command, e.g grep or sed, to retrieve only the value, how to do it? Thanks! Victor (6 Replies)
Discussion started by: victorcheung
6 Replies

10. Shell Programming and Scripting

Need to change two words in a line using shell script.

Hi, i have a line tftp dgram udp wait nobody /usr/sbin/tcpd in.tftpd /tftpboott in /etc/inet.conf file. ineed to replace nobody with root and /tftpboott with /flx/boot. i tried using sed ,but i could not change both of them. can you please help me to do this. Edit:... (7 Replies)
Discussion started by: vprasads
7 Replies
Login or Register to Ask a Question