Search and repllace of strings with space between words


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Search and repllace of strings with space between words
# 1  
Old 04-05-2016
Search and repllace of strings with space between words

Dear all,
I have gone through all the search and replace requests but none of them meet my particular need. I have a huge file in which all Unicode characters are stored as Names. A sample is given below. I want to replace strings in that file with a mapper from another file termed as master.dic. The peculiarity is that the strings in the master dictionary have spaces between the words and may have a space also at at the end. The other peculiarity is that some of the mapping strings can map to a
Code:
null

What I am looking for is a perl or awk script which can do the operation and should be able to search and replace in a file having around 100,000 individual string sets and each string set can have up to 6 or 7 names inside.
Some samples are given above.
Code:
Master.dic: Only sample rules are given
telugu letter=
a telugu vowel sign uu=u
a telugu vowel sign ii=i
telugu sign anusvara=n
a telugu vowel sign e=e
a telugu vowel sign o=o
a telugu vowel sign aa=a
a telugu sign virama=
vowel sign vocalic r=ri
a telugu vowel sign ii=i
telugu vowel sign=

Code:
Input file: The file on which the operation is to be carried out
telugu letter aa
telugu letter aa  telugu letter ii telugu sign anusvara  telugu letter dda telugu letter la
telugu letter aa  telugu letter ii telugu sign anusvara telugu letter dda
telugu letter aa  telugu letter ii telugu sign anusvara telugu letter dda telugu sign virama telugu letter la
telugu letter aa  telugu letter ii telugu letter aa
telugu letter aa  telugu letter ii telugu letter aa  telugu letter sha telugu vowel sign o telugu letter ka telugu sign virama
telugu letter aa  telugu letter ii telugu letter ka
telugu letter aa  telugu letter ii telugu letter ka telugu vowel sign aa
telugu letter aa  telugu letter ii telugu letter dda telugu sign virama telugu letter ra telugu vowel sign uu telugu letter sa telugu sign virama
telugu letter aa  telugu letter ii telugu letter ta  telugu letter ga telugu vowel sign o telugu letter na telugu vowel sign ii
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign aa
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign aa telugu letter ra  telugu letter va telugu vowel sign e telugu letter na telugu vowel sign ii
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign ii
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign uu
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign e
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign e telugu sign anusvara
telugu letter aa  telugu letter ii telugu letter ta telugu sign virama
telugu letter aa  telugu letter ii telugu letter ta telugu sign virama telugu letter ya
telugu letter aa  telugu letter ii telugu letter ta telugu sign virama telugu letter la
telugu letter aa  telugu letter ii telugu letter ta telugu sign virama telugu letter la telugu vowel sign aa
telugu letter aa  telugu letter ii telugu letter dha
telugu letter aa  telugu letter ii telugu letter na
telugu letter aa  telugu letter ii telugu letter na telugu vowel sign aa
telugu letter aa  telugu letter ii telugu letter na telugu sign virama
telugu letter aa  telugu letter ii telugu letter ma
telugu letter aa  telugu letter ii telugu letter ra  telugu letter ma telugu vowel sign e telugu letter sha telugu sign virama
telugu letter aa  telugu letter ii telugu letter ra  telugu letter va telugu vowel sign ii telugu sign anusvara  telugu letter dda telugu letter ra telugu sign virama
telugu letter aa  telugu letter ii telugu letter la

Code:
Expected out put
aa
aa ii n dda la
aa ii n dda
aa ii n ddla
aa ii aa
aa ii aa sho ka
aa ii ka
aa ii kaa
aa ii ddruu sa
aa ii ta go nii
aa ii taa
aa ii taa ra ve nii
aa ii tii
aa ii tuu
aa ii te
aa ii te n
aa ii ta
aa ii tya
aa ii tla
aa ii tlaa
aa ii dha
aa ii na
aa ii naa
aa ii na
aa ii ma
aa ii ra me sha
aa ii ra vii n dda ra
aa ii la

Many thanks for your help. I work under Windows environment so a Perl or Awk script would be of help.
# 2  
Old 04-05-2016
try:
Code:
awk '
NR==FNR {sub(" *$", ""); w[$1]=$2; next; }
{for (i in w) gsub(i, w[i]); sub("^ *", "") sub(" *$", ""); gsub("  *", " ")}
1
' FS="=" Master.dic infile

This User Gave Thanks to rdrtx1 For This Post:
# 3  
Old 04-05-2016
Hello,
Thank you for responding. The awk script works well but gives some surprising reults, mainly repetition of the strings.
I used the script you provided I only included the file separator in the script itself.:
Code:
#Awk script to replace long strings in Input file by mappers in master.dic
#Syntax gawk32 -f snr.gk master.dic infile>infileout
FS="="
NR==FNR {sub(" *$", ""); w[$1]=$2; next; }
{for (i in w) gsub(i, w[i]); sub("^ *", "") sub(" *$", ""); gsub("  *", " ")}
1

However on applying it to the example given I got the following output:
Code:
telugu letter=
a telugu vowel sign uu=u
a telugu vowel sign ii=i
telugu sign anusvara=n
a telugu vowel sign e=e
a telugu vowel sign o=o
a telugu vowel sign aa=a
a telugu sign virama=
vowel sign vocalic r=ri
a telugu vowel sign ii=i
telugu vowel sign=
telugu letter aa
aa
telugu letter aa  telugu letter ii telugu sign anusvara  telugu letter dda telugu letter la
aa ii n dda la
telugu letter aa  telugu letter ii telugu sign anusvara telugu letter dda
aa ii n dda
telugu letter aa  telugu letter ii telugu sign anusvara telugu letter dda telugu sign virama telugu letter la
aa ii n dd la
telugu letter aa  telugu letter ii telugu letter aa
aa ii aa
telugu letter aa  telugu letter ii telugu letter aa  telugu letter sha telugu vowel sign o telugu letter ka telugu sign virama
aa ii aa sho k
telugu letter aa  telugu letter ii telugu letter ka
aa ii ka
telugu letter aa  telugu letter ii telugu letter ka telugu vowel sign aa
aa ii ka aa
telugu letter aa  telugu letter ii telugu letter dda telugu sign virama telugu letter ra telugu vowel sign uu telugu letter sa telugu sign virama
aa ii dd ra uu s
telugu letter aa  telugu letter ii telugu letter ta  telugu letter ga telugu vowel sign o telugu letter na telugu vowel sign ii
aa ii ta go ni
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign aa
aa ii ta aa
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign aa telugu letter ra  telugu letter va telugu vowel sign e telugu letter na telugu vowel sign ii
aa ii ta aa ra ve ni
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign ii
aa ii ti
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign uu
aa ii ta uu
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign e
aa ii te
telugu letter aa  telugu letter ii telugu letter ta telugu vowel sign e telugu sign anusvara
aa ii te n
telugu letter aa  telugu letter ii telugu letter ta telugu sign virama
aa ii t
telugu letter aa  telugu letter ii telugu letter ta telugu sign virama telugu letter ya
aa ii t ya
telugu letter aa  telugu letter ii telugu letter ta telugu sign virama telugu letter la
aa ii t la
telugu letter aa  telugu letter ii telugu letter ta telugu sign virama telugu letter la telugu vowel sign aa
aa ii t la aa
telugu letter aa  telugu letter ii telugu letter dha
aa ii dha
telugu letter aa  telugu letter ii telugu letter na
aa ii na
telugu letter aa  telugu letter ii telugu letter na telugu vowel sign aa
aa ii na aa
telugu letter aa  telugu letter ii telugu letter na telugu sign virama
aa ii n
telugu letter aa  telugu letter ii telugu letter ma
aa ii ma
telugu letter aa  telugu letter ii telugu letter ra  telugu letter ma telugu vowel sign e telugu letter sha telugu sign virama
aa ii ra me sh
telugu letter aa  telugu letter ii telugu letter ra  telugu letter va telugu vowel sign ii telugu sign anusvara  telugu letter dda telugu letter ra telugu sign virama
aa ii ra vi n dda r
telugu letter aa  telugu letter ii telugu letter la
aa ii la

As you can see the initial string is still retained and is followed by the modified string.
Any suggestions please. Thanks a lot for the script
# 4  
Old 04-05-2016
Remove FS from body of script. Use like:
Code:
NR==FNR {FS="="; sub(" *$", ""); w[$1]=$2; next; }

This User Gave Thanks to rdrtx1 For This Post:
# 5  
Old 04-05-2016
Thanks a lot. It worked and I got the desired result. I should have thought of Removing
Code:
FS

from the body of script.
Many thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search words in any quote position and then change the words

hi, i need to replace all words in any quote position and then need to change the words inside the file thousand of raw. textfile data : "Ninguno","Confirma","JuicioABC" "JuicioCOMP","Recurso","JuicioABC" "JuicioDELL","Nulidad","Nosino" "Solidade","JuicioEUR","Segundo" need... (1 Reply)
Discussion started by: benjietambling
1 Replies

2. Shell Programming and Scripting

sed Find and Replace Text Between Two Strings or Words

I am looking for a sed in which I can recognize all of the text in between two indicators and then replace it with a place holder. For instance, the 1st indicator is a list of words "no|noone|havent" and the 2nd indicator is a list of punctuation ".|,|!".From a sentence such as "noone... (3 Replies)
Discussion started by: owwow14
3 Replies

3. Shell Programming and Scripting

How to grep the words with space between?

see I have a text like: 27-MAY 14:00 4 aaa 5.30 0.01 27-MAY 14:00 3 aaa 0.85 0.00 27-MAY 14:00 2 aaa 1.09 0.00 27-MAY 14:00 5 aaa 0.03 0.00 27-MAY 14:00... (3 Replies)
Discussion started by: netbanker
3 Replies

4. Shell Programming and Scripting

USING sed to remove multiple strings/words from a line

Hi I use sed comnand to remove occurance of one workd from a line. However I need to removed occurance of dufferent words in ne line. Original-1 Hi this is the END of my begining Comand sed s/"END"/"start"/g Output-1 Hi this is the start of my beginig But I have more... (9 Replies)
Discussion started by: mnassiri
9 Replies

5. UNIX for Dummies Questions & Answers

Search file or log for words or strings

i want to search a log for occurrences of words and i want the result to tell me how many lines in the log contained each word. if i type a command like this: egrep "cat|dog|monkey|bananas|bike" logfile i would like a response like this: cat=3,dog=17,monkey=1,bananas=102,bike=51 the... (12 Replies)
Discussion started by: SkySmart
12 Replies

6. Shell Programming and Scripting

Splitting Concatenated Words With Largest Strings First

hello, I had posted earlier help for a script for splitting concatenated words . The script was supposed to read words from a master file and split concatenated words in the slave/input file. Thanks to the help I got, the following script which works very well was posted. It detects residues by... (14 Replies)
Discussion started by: gimley
14 Replies

7. Shell Programming and Scripting

delete repeated strings (tags) in a line and concatenate corresponding words

Hello friends! Each line of my input file has this format: word<TAB>tag1<blankspace>lemma<TAB>tag2<blankspace>lemma ... <TAB>tag3<blankspace>lemma Of this file I need to eliminate all the repeated tags (of the same word) in a line, as in the example here below, but conserving both (all) the... (2 Replies)
Discussion started by: mjomba
2 Replies

8. Shell Programming and Scripting

perl or awk print strings between words

hi everyone, 1.txt 981 I field1 > field2.a: aa, ..si01To:<f:a@a.com>From: <f:a@a.com>;tag=DVNgfRZBZRMi96 <f:a@1:333>;ZZZZZ: 12345 the output field1 field2 <f:a@a.com> the output is cut the string 3rd and 5th field, and get the value betwee "To:" and "From:", please advice. ... (1 Reply)
Discussion started by: jimmy_y
1 Replies

9. Shell Programming and Scripting

Insert space between two words

Hi, I need to insert space between words on my output in UNIX other than the single space given by the space bar on my keyboard, e.g when are you going. (There should be 4 spaces between each of these words) rather than when are you going Can anyone help me with... (3 Replies)
Discussion started by: divroro12
3 Replies

10. Shell Programming and Scripting

compare strings, words in different order

Hi, Would anyone know how to compare two strings, and only throw an error if there were different words, not that the same words were in a different order? e.g "A B C" vs "B C A" ->OK "A B C" vs "A D C" -> BAD Thanks! (2 Replies)
Discussion started by: rebelbuttmunch
2 Replies
Login or Register to Ask a Question