awk Replace Multiple patterns within a list_file with One in target_file


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers awk Replace Multiple patterns within a list_file with One in target_file
# 1  
Old 12-19-2017
Linux awk Replace Multiple patterns within a list_file with One in target_file

I'm facing a problem

1) I got a list_file intended to be used for inlace replacement like this

Code:
  Replacement pattern ; Matching patterns

    EXTRACT ___________________
    toto ; tutu | tata | tonton  | titi 
    bobo ; bibi | baba | bubu | bebe 
    etc. 14000 lines !!!
    _____________________________



2) I got a target file in witch I want to replace thoses paterns

Code:
EXTRACT INPUT _______________
    hello my name is bob and I am a Titi and I like bubu
    _____________________________


I want it to become

Code:
EXTRACT OUTPUT ______________
    hello my name is bob and I am a toto and I like bobo
    _____________________________

Actually I am using awk to try to achieve this with this command :

Code:
   awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A[i]/,i)}1' simplifier_FR.txt text.txt

Sadly awk doesn't seems to understand the pipe « | » character as a OR indicator ... I have also tried to achieve this with sed but this option goes very slowly aven if it works Smilie

does anyone have a better idea ?
Thanks
M
# 2  
Old 12-19-2017
awk DOES understand a | character in a RE because it actually takes ERE, just like GNU sed with the -r option.
But a standard sed does NOT.
Your awk code has several bugs.
Is this homework/coursework?
# 3  
Old 12-19-2017
I am trying to send a regex with pipes to do a
'pattern OR pattern OR ...'
with 'pattern | pattern | ...'

for example with one replacement :
Code:
echo 'toto; tutu | tata | tonton | titi ' | awk '{gsub(/ tutu | tata | tonton | titi /," toto ")}1'
gives 
toto; toto | toto | toto | toto

with
Code:
awk -F';' 'NR==FNR{A[$1]=$2; next} IGNORECASE = 1 {for(i in A) gsub(/A[i]/,i)}1’

I expect to :
1 ) register an array A with $2 as content and $1 as key
so in the fist line
$2 =' tutu | tata | tonton | titi '
$1 = ' toto '
2 ) replace with gsub(/$2/,$1)}1
so in the fist line
awk 'IGNORECASE = 1 {gsub(/ tutu | tata | tonton | titi /," toto ")}1

actualy i am looking to -f option
Is that a good idea ?
I am thinking about doing
Code:
BEGIN
{replacing command 1}
{replacing command 2}
etc.
END

What coold I do ?

Last edited by mpvphd; 12-19-2017 at 03:49 PM..
# 4  
Old 12-19-2017
Yes, your idea with an ERE and pipe-OR works.
The main bug in your awk code is: the ERE is in / / (or in " ") when it is a constant. Not if it's in a variable!
Then, the input words have spaces around. How does it find the last word when there is no trailing space?
Then, you use the assignment IGNORECASE = 1 as a condition. Fortunately it is always true so the following { block } is run. Better have no condtion and set the variable once at the BEGINning!
Attempt to fix the bugs (untested)
Code:
awk -F';' 'BEGIN { IGNORECASE = 1 } NR==FNR { A[$1] = $2; next } { x = (" " $0 " "); for (i in A) gsub(A[i], i, x); sub(/^ /, "", x); sub(/ $/, "", x); print x }'


Last edited by MadeInGermany; 12-19-2017 at 04:50 PM.. Reason: Fixed a wrong ' character
This User Gave Thanks to MadeInGermany For This Post:
# 5  
Old 12-19-2017
thank you that works
the probleme came from my awk version but thanks for your answer !!!!
# 6  
Old 12-19-2017
I don't know what you mean about the problem being the version of awk you were using when there were so many logic errors in your code. But, if you have it working now, congratulations.

Note, however, that in addition to the corrections MadeInGermany already listed, you also need to be absolutely sure that your first input file has exactly one <space> character before and after each word you're searching for as possible text to be replaced. For example, with the sample data you provided, no changes would be made to the following lined of text:
Code:
The word tonton in this text will not be changed to toto because there aren't
two <space> characters following any occurrence of tonton in this sentence, but
there is one <space> before tonton and two <space>s after tonton in your sample
simplifier_FR.txt file.

You might also want to note that if there are any punctuation characters before or after any of the words you want to replace, the code you're using won't find and/or replace them.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Grep multiple patterns(file) and replace whole line

I am able to grep multiple patterns which stored in a files. However, how could we replace the whole line with either the pattern or new string? For example: pattern_file: *Info in the () is not part of the pattern file. They are the intended name to replace the whole line after the pattern... (5 Replies)
Discussion started by: wxboo
5 Replies

2. Shell Programming and Scripting

Check multiple patterns in awk

I need to check if 2 values exists in the file and if they are equal print 0. output.txt: ------------ 1 2 3 4 5 6 Inputs: a=1 b=2 My pattern matching code works but I am trying to set a counter if both the pattern matches which does not work.If the count > 0,then I want to... (3 Replies)
Discussion started by: kannan13
3 Replies

3. Shell Programming and Scripting

Replacing multiple line patterns with awk

Hi forum, Can you please help me understand how to look for and replace the below pattern (containing line breaks) and return a new result? Rules: Must match the 3 line pattern and return a 1 line result. I have found solutions with sed, but it seems that sed installed in my system is... (5 Replies)
Discussion started by: demmel
5 Replies

4. Shell Programming and Scripting

Replace multiple patterns together with retaining the text in between

Hi Team I have the following text in one of the file j1738-abc-system_id(in.value1)-2838 G566-deF-system_id(in.value2)-7489 I want to remove system_id(...) combination completely The output should look like this j1738-abc-in.value1-2838 G566-deF-in.value2-7489 Any help is appreciated... (4 Replies)
Discussion started by: Thierry Henry
4 Replies

5. Shell Programming and Scripting

Search and replace multiple patterns in a particular column only - efficient script

Hi Bigshots, I have a pattern file with two columns. I have another data file. If column 1 in the pattern file appears as the 4th column in the data file, I need to replace it (4th column of data file) with column 2 of the pattern file. If the pattern is found in any other column, it should not... (6 Replies)
Discussion started by: ss112233
6 Replies

6. Shell Programming and Scripting

Multiple patterns for awk script

Hi, I'm getting stuck when supplying multiple patterns for the below code: awk -F, ' .. .. if ($0 ~ pattern) { .. .. } .. .. ' pattern='$ROW' input_file for the same code I'm trying to supply multiple patterns as given below: awk -F, ' .. .. if( ($0 ~ pattern) && ($0 ~... (6 Replies)
Discussion started by: penqueen
6 Replies

7. Shell Programming and Scripting

Searching multiple patterns using awk

Hello, I have the following input file: qh1adm 20130710111201 : tp import all QH1 u6 -Dsourcesystems=BFI,EBJ qh1adm 20130711151154 : tp import all QH1 u6 -Dsourcesystems=BFI,EBJ qx1adm 20130711151154 : tp count QX1 u6 -Dsourcesystems=B17,E17,EE7 qh1adm 20130711151155 : tp import all... (7 Replies)
Discussion started by: kcboy
7 Replies

8. Shell Programming and Scripting

[Solved] HP-UX awk sub multiple patterns

Hi, I am using sub to remove blank spaces and one pattern(=>) from the input string. It works fine when I am using two sub functions for the same. However it is giving error while I am trying to remove both spaces and pattern using one single sub function. Working: $ echo " OK => " |awk... (2 Replies)
Discussion started by: sai_2507
2 Replies

9. UNIX for Dummies Questions & Answers

replace multiple patterns in a string/filename

This should be somewhat simple, but I need some help with this one. I have a bunch of files with tags on the end like so... Filename {tag1}.ext Filename2 {tag1} {tag2}.ext I want to hold in a variable just the filename with all the " {tag}" removed. The tag can be anything so I'm looking... (4 Replies)
Discussion started by: kerppz
4 Replies

10. UNIX for Dummies Questions & Answers

AWK: Multiple patterns per line

Hi there, We have been given a bit of coursework using awk on html pages. Without giving too much away and risking the wrath of the plagerism checks, I can say we need to deal with certain html elements. There may be several of these elements on one line. My question is, if there are more... (1 Reply)
Discussion started by: Plavixo
1 Replies
Login or Register to Ask a Question