👤


UNIX for Beginners Questions & Answers

If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

awk Replace Multiple patterns within a list_file with One in target_file

awk, awk -f

👤 Login to reply
 
Thread Tools Search this Thread Display Modes
    #1  
Old 12-19-2017
mpvphd mpvphd is offline
Registered User
 
Join Date: Dec 2017
Last Activity: 19 December 2017, 4:02 PM EST
Posts: 3
Thanks: 1
Thanked 0 Times in 0 Posts
Linux awk Replace Multiple patterns within a list_file with One in target_file

I’m facing a problem

1) I got a list_file intended to be used for inlace replacement like this

Code:
  Replacement pattern ; Matching patterns

    EXTRACT ___________________
    toto ; tutu | tata | tonton  | titi 
    bobo ; bibi | baba | bubu | bebe 
    etc. 14000 lines !!!
    _____________________________



2) I got a target file in witch I want to replace thoses paterns

Code:
EXTRACT INPUT _______________
    hello my name is bob and I am a Titi and I like bubu
    _____________________________


I want it to become

Code:
EXTRACT OUTPUT ______________
    hello my name is bob and I am a toto and I like bobo
    _____________________________

Actually I am using awk to try to achieve this with this command :

Code:
   awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A[i]/,i)}1’ simplifier_FR.txt text.txt

Sadly awk doesn’t seems to understand the pipe « | » character as a OR indicator … I have also tried to achieve this with sed but this option goes very slowly aven if it works

does anyone have a better idea ?
Thanks
M
Sponsored Links
    #2  
Old 12-19-2017
MadeInGermany MadeInGermany is offline Forum Staff  
Moderator
 
Join Date: May 2012
Last Activity: 17 July 2018, 9:24 PM EDT
Location: Simplicity
Posts: 4,156
Thanks: 365
Thanked 1,419 Times in 1,275 Posts
awk DOES understand a | character in a RE because it actually takes ERE, just like GNU sed with the -r option.
But a standard sed does NOT.
Your awk code has several bugs.
Is this homework/coursework?
Sponsored Links
    #3  
Old 12-19-2017
mpvphd mpvphd is offline
Registered User
 
Join Date: Dec 2017
Last Activity: 19 December 2017, 4:02 PM EST
Posts: 3
Thanks: 1
Thanked 0 Times in 0 Posts
I am trying to send a regex with pipes to do a
'pattern OR pattern OR ...'
with 'pattern | pattern | ...'

for example with one replacement :
Code:
echo 'toto; tutu | tata | tonton | titi ' | awk '{gsub(/ tutu | tata | tonton | titi /," toto ")}1'
gives 
toto; toto | toto | toto | toto

with
Code:
awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A[i]/,i)}1’

I expect to :
1 ) register an array A with $2 as content and $1 as key
so in the fist line
$2 =' tutu | tata | tonton | titi '
$1 = ' toto '
2 ) replace with gsub(/$2/,$1)}1
so in the fist line
awk 'IGNORECASE = 1 {gsub(/ tutu | tata | tonton | titi /," toto ")}1

actualy i am looking to -f option
Is that a good idea ?
I am thinking about doing
Code:
BEGIN
{replacing command 1}
{replacing command 2}
etc.
END

What coold I do ?

Last edited by mpvphd; 12-19-2017 at 02:49 PM..
    #4  
Old 12-19-2017
MadeInGermany MadeInGermany is offline Forum Staff  
Moderator
 
Join Date: May 2012
Last Activity: 17 July 2018, 9:24 PM EDT
Location: Simplicity
Posts: 4,156
Thanks: 365
Thanked 1,419 Times in 1,275 Posts
Yes, your idea with an ERE and pipe-OR works.
The main bug in your awk code is: the ERE is in / / (or in " ") when it is a constant. Not if it's in a variable!
Then, the input words have spaces around. How does it find the last word when there is no trailing space?
Then, you use the assignment IGNORECASE = 1 as a condition. Fortunately it is always true so the following { block } is run. Better have no condtion and set the variable once at the BEGINning!
Attempt to fix the bugs (untested)
Code:
awk -F';' 'BEGIN { IGNORECASE = 1 } NR==FNR { A[$1] = $2; next } { x = (" " $0 " "); for (i in A) gsub(A[i], i, x); sub(/^ /, "", x); sub(/ $/, "", x); print x }'


Last edited by MadeInGermany; 12-19-2017 at 03:50 PM.. Reason: Fixed a wrong ' character
The Following User Says Thank You to MadeInGermany For This Useful Post:
mpvphd (12-19-2017)
Sponsored Links
    #5  
Old 12-19-2017
mpvphd mpvphd is offline
Registered User
 
Join Date: Dec 2017
Last Activity: 19 December 2017, 4:02 PM EST
Posts: 3
Thanks: 1
Thanked 0 Times in 0 Posts
thank you that works
the probleme came from my awk version but thanks for your answer !!!!
Sponsored Links
    #6  
Old 12-19-2017
Don Cragun's Unix or Linux Image
Don Cragun Don Cragun is offline Forum Staff  
Administrator
 
Join Date: Jul 2012
Last Activity: 17 July 2018, 7:22 PM EDT
Location: San Jose, CA, USA
Posts: 11,407
Thanks: 649
Thanked 3,970 Times in 3,393 Posts
I don't know what you mean about the problem being the version of awk you were using when there were so many logic errors in your code. But, if you have it working now, congratulations.

Note, however, that in addition to the corrections MadeInGermany already listed, you also need to be absolutely sure that your first input file has exactly one <space> character before and after each word you're searching for as possible text to be replaced. For example, with the sample data you provided, no changes would be made to the following lined of text:
Code:
The word tonton in this text will not be changed to toto because there aren't
two <space> characters following any occurrence of tonton in this sentence, but
there is one <space> before tonton and two <space>s after tonton in your sample
simplifier_FR.txt file.

You might also want to note that if there are any punctuation characters before or after any of the words you want to replace, the code you're using won't find and/or replace them.
Sponsored Links
👤 Login to reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Check multiple patterns in awk kannan13 Shell Programming and Scripting 3 10-14-2015 05:17 AM
Replace multiple patterns together with retaining the text in between Thierry Henry Shell Programming and Scripting 4 12-13-2014 04:23 PM
Search and replace multiple patterns in a particular column only - efficient script ss112233 Shell Programming and Scripting 6 12-01-2014 03:39 PM
Searching multiple patterns using awk kcboy Shell Programming and Scripting 7 07-14-2013 12:26 PM
replace multiple patterns in a string/filename kerppz UNIX for Dummies Questions & Answers 4 09-12-2010 04:05 PM



All times are GMT -4. The time now is 11:03 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
×
UNIX.COM Login
Username:
Password:  
Show Password





Not a Forum Member?
Forgot Password?