Home Man
Today's Posts

If you're not sure where to post a Unix or Linux question, post it here. All unix and Linux beginners welcome in this forum!

awk Replace Multiple patterns within a list_file with One in target_file

awk, awk -f

Login to Reply

Thread Tools Search this Thread
# 1  
Old 12-19-2017
Linux awk Replace Multiple patterns within a list_file with One in target_file

I’m facing a problem

1) I got a list_file intended to be used for inlace replacement like this

  Replacement pattern ; Matching patterns

    EXTRACT ___________________
    toto ; tutu | tata | tonton  | titi 
    bobo ; bibi | baba | bubu | bebe 
    etc. 14000 lines !!!

2) I got a target file in witch I want to replace thoses paterns

EXTRACT INPUT _______________
    hello my name is bob and I am a Titi and I like bubu

I want it to become

EXTRACT OUTPUT ______________
    hello my name is bob and I am a toto and I like bobo

Actually I am using awk to try to achieve this with this command :

   awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A[i]/,i)}1’ simplifier_FR.txt text.txt

Sadly awk doesn’t seems to understand the pipe « | » character as a OR indicator … I have also tried to achieve this with sed but this option goes very slowly aven if it works

does anyone have a better idea ?
# 2  
Old 12-19-2017
awk DOES understand a | character in a RE because it actually takes ERE, just like GNU sed with the -r option.
But a standard sed does NOT.
Your awk code has several bugs.
Is this homework/coursework?
# 3  
Old 12-19-2017
I am trying to send a regex with pipes to do a
'pattern OR pattern OR ...'
with 'pattern | pattern | ...'

for example with one replacement :
echo 'toto; tutu | tata | tonton | titi ' | awk '{gsub(/ tutu | tata | tonton | titi /," toto ")}1'
toto; toto | toto | toto | toto

awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A[i]/,i)}1’

I expect to :
1 ) register an array A with $2 as content and $1 as key
so in the fist line
$2 =' tutu | tata | tonton | titi '
$1 = ' toto '
2 ) replace with gsub(/$2/,$1)}1
so in the fist line
awk 'IGNORECASE = 1 {gsub(/ tutu | tata | tonton | titi /," toto ")}1

actualy i am looking to -f option
Is that a good idea ?
I am thinking about doing
{replacing command 1}
{replacing command 2}

What coold I do ?

Last edited by mpvphd; 12-19-2017 at 02:49 PM..
# 4  
Old 12-19-2017
Yes, your idea with an ERE and pipe-OR works.
The main bug in your awk code is: the ERE is in / / (or in " ") when it is a constant. Not if it's in a variable!
Then, the input words have spaces around. How does it find the last word when there is no trailing space?
Then, you use the assignment IGNORECASE = 1 as a condition. Fortunately it is always true so the following { block } is run. Better have no condtion and set the variable once at the BEGINning!
Attempt to fix the bugs (untested)
awk -F';' 'BEGIN { IGNORECASE = 1 } NR==FNR { A[$1] = $2; next } { x = (" " $0 " "); for (i in A) gsub(A[i], i, x); sub(/^ /, "", x); sub(/ $/, "", x); print x }'

Last edited by MadeInGermany; 12-19-2017 at 03:50 PM.. Reason: Fixed a wrong ' character
The Following User Says Thank You to MadeInGermany For This Useful Post:
mpvphd (12-19-2017)
# 5  
Old 12-19-2017
thank you that works
the probleme came from my awk version but thanks for your answer !!!!
# 6  
Old 12-19-2017
I don't know what you mean about the problem being the version of awk you were using when there were so many logic errors in your code. But, if you have it working now, congratulations.

Note, however, that in addition to the corrections MadeInGermany already listed, you also need to be absolutely sure that your first input file has exactly one <space> character before and after each word you're searching for as possible text to be replaced. For example, with the sample data you provided, no changes would be made to the following lined of text:
The word tonton in this text will not be changed to toto because there aren't
two <space> characters following any occurrence of tonton in this sentence, but
there is one <space> before tonton and two <space>s after tonton in your sample
simplifier_FR.txt file.

You might also want to note that if there are any punctuation characters before or after any of the words you want to replace, the code you're using won't find and/or replace them.
Login to Reply

« Previous Thread | Next Thread »
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Sed, how replace specific symbols between two patterns Tapiocapioca Shell Programming and Scripting 8 03-31-2018 08:09 AM
Replace multiple patterns together with retaining the text in between Thierry Henry Shell Programming and Scripting 4 12-13-2014 04:23 PM
Search and replace multiple patterns in a particular column only - efficient script ss112233 Shell Programming and Scripting 6 12-01-2014 03:39 PM
Grep from multiple patterns multiple file multiple output Diya123 Shell Programming and Scripting 3 11-02-2013 12:38 PM
Searching multiple patterns using awk kcboy Shell Programming and Scripting 7 07-14-2013 12:26 PM
Replace patterns in a file isaacniu Shell Programming and Scripting 4 02-08-2012 09:53 PM
Search multiple patterns in multiple files vsachan Shell Programming and Scripting 10 01-26-2011 04:48 PM
replace multiple patterns in a string/filename kerppz UNIX for Dummies Questions & Answers 4 09-12-2010 04:05 PM
Find multiple patterns on multiple lines and concatenate output wilg0005 Shell Programming and Scripting 8 11-03-2009 08:11 PM
how to replace certain patterns in a file thru unix!! mexx_freedom UNIX for Dummies Questions & Answers 6 08-31-2001 10:48 AM

All times are GMT -4. The time now is 04:32 AM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
Show Password