Multi-Line Search and Replace


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Multi-Line Search and Replace
# 1  
Old 12-21-2010
Multi-Line Search and Replace

There appears to be several threads that touch on what I'm trying to do, but nothing quite generic enough.

What I need to do is search through many (poorly coded) HTML files and make changes. The catch is that my search string may be on one line or may be on several lines.

For example there are many files with outdated links. So, I want to search on:
Code:
 <A HREF="/BAD/LINK"> LINK TEXT </A>

and make some change. If it were all on 1 line then I could just do a sed substitution and be done with it.

The problem is that in the (poorly coded) HTML files the pattern might be found like this:
Code:
 text text text <A
HREF="BAD/LINK"> LINK
TEXT </A> text text text

Where the line breaks are completely random.

I need a way to search on a specific pattern of text that may occur over 1 or more lines and then make my substitution.

I'm not married to any one tool- I've given up on sed and searched for potential python or perl solutions. Nothing I've found is generic enough to handle all the different ways these files are written.
# 2  
Old 12-21-2010
Please provide a bigger sample of input file and expected output

This can for example be used to put it in one line and insert some # after every </tr> mark then translate the # into a \n
instead of # you should use a character that does not already exist in the file you want to parse. (µ#% ...)

Code:
tr '\n' ' ' <tst | sed 's|</tr>|</tr>#|g' | tr '#' '\n'

Code:
tr '\n' ' ' <tst | sed 's|</A>|</A>#|g' | tr '#' '\n'

Instead of "tst" put the name of your input file

Last edited by ctsgnb; 12-21-2010 at 06:46 PM..
# 3  
Old 12-22-2010
Code:
perl -i -pe '
BEGIN{undef $/;}$bak=$_; 
while ($bak=~m/<A\s+HREF.*?<\/A>/gs){$x=$y=$&; $y=~ s/\s+/ /g; $_=~s/$x/$y/gs;} 
'  temp.txt

This User Gave Thanks to k_manimuthu For This Post:
# 4  
Old 01-01-2011
Thank you.
I was able to make a few changes and turn it into a script.
One of the problems I was dealing with was broken links pointing to files in a non-existent directory named refrnc. So, I wanted to find all hyperlinks to files in that directory and just remove the tag but leave the text.

So, HTML that looked like:
Code:
....... <a href="../refrnc/file.htm"><center><i>LINK TEXT</center></i></a> ........

Or being that it could occur over multiple lines, it could have been:
Code:
....... <a href="../refrnc/file.htm"><center><i>
LINK TEXT </center></i></a> ........

OR
Code:
.......<a href=
"../refrnc/file.htm"><center><i>LINK TEXT</center>
</i></a>.........

And I wanted to change it to:
Code:
 ....... <center><i>LINK TEXT</center></i>........

The following script makes these changes across multiple lines and case-insensitively.

Code:
#!/bin/perl -i -p
 BEGIN{undef $/;}
$bak=$_;
 while ($bak=~m#<a\s+href.*?/refrnc/.*?>.*?</a>#sig) {
        $x=$y=$&;
        $y=~s/\s+/ /g;
        $y=~s#<a\s+href.*?>##si;
        $y=~s#</a>##si;
        $_=~s/\Q$x/$y\E/sig;
        }

There are still a few things I don't understand.
1) What is $&?
2) What is the undef $/ doing?
3) Is it really necessary to set $bak=$_ ? Couldn't I have used $_ in the while loop and skipped the step of setting $bak ?

Is there anything in there that just looks like really bad PERL? Or, anything that could be done better?
# 5  
Old 01-01-2011
From perlvar, undefining the input record separator causes <> to read an entire file at a time. And from perlvar, $& is the string matched by the regex <a\s+href.*?/refrnc/.*?>.*?</a>.

I also wonder about all the working being done -- can you provide references that you based this on? But since you just want to remove the offending "<a>" and "</a>":
Code:
#! /bin/perl -p -i

BEGIN{undef $/;}

s{<a\s+href.*?/refrnc/.*?>(.*?)</a>}{$1}sig;

This User Gave Thanks to m.d.ludwig For This Post:
# 6  
Old 01-02-2011
I just took what k_manimuthu posted and hacked around a bit.

Really, my goal is to learn enough that I can apply this to different cases as customer requests come up. I've been able to handle most everything with BASH/SED/AWK.... I don't know PERL, but perhaps it's time to take my O'Reilly PERL programming book off the bookshelf and finally learn this.

BTW, is there a way to do this in Python? If I'm going to commit to learning a new scripting language I think I'd prefer to learn Python.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Multi line regex for search and replace

I have text file like below: a.txt Server=abc Run=1 Time=120.123 Tables=10 Sessions=16 Time=380.123 Version=1.1 Jobs=5 Server=abc Run=2 Time=160.123 Tables=15 Sessions=16 Time=400.258 Version=2.0 (1 Reply)
Discussion started by: sol_nov
1 Replies

2. Shell Programming and Scripting

Replace a multi-line strings or numbers

Hi I have no experience in Unix so any help would be appreciated I have the flowing text 235543 123 45654 199 225 578 45654 199 225 I need to find this sequence from A file 45654 199 225 (22 Replies)
Discussion started by: khaled79
22 Replies

3. Shell Programming and Scripting

sed to replace a line with multi lines from a var

I am trying to find a line in a file ("Replace_Flag") and replace it with a variable which hold a multi lined file. myVar=`cat myfile` sed -e 's/Replace_Flag/'$myVar'/' /pathto/test.file myfile: cat dog boy girl mouse house test.file: football hockey Replace_Flag baseball ... (4 Replies)
Discussion started by: bblondin
4 Replies

4. Shell Programming and Scripting

SED - insert space at the beginning of line and multi replace command

hi I am trying to use SED to replace the line matching a pattern using the command sed 'pattern c\ new line ' <file1 >file 2 I got two questions 1. how do I insert a blank space at the beginning of new line? 2. how do I use this command to execute multiple command using the -e... (5 Replies)
Discussion started by: piynik
5 Replies

5. Shell Programming and Scripting

Search for a multi-line strings in a file

Hello I need to search for a mult-line strngs(with spaces in between and qoted) in a file1 and replace that text with Fixed string globally in file1. The strng to search for is in file2. The file is big with some 20K records. so speed and effciency is required file1: (where srch & rplc will... (7 Replies)
Discussion started by: Hiano
7 Replies

6. Shell Programming and Scripting

Global search and replace multi line file

Hello I need to search for a mult-line strngs(with spaces in between and qoted) in a file1 and replace that text with Fixed string globally in file1. The strng to search for is in file2. The file is big with some 20K records. so speed and effciency is required file1: (where srch & rplc... (0 Replies)
Discussion started by: Hiano
0 Replies

7. Shell Programming and Scripting

perl search and replace - search in first line and replance in 2nd line

Dear All, i want to search particular string and want to replance next line value. following is the test file. search string is tmp,??? ,10:1 "???" may contain any 3 character it should remain the same and next line replace with ,10:50 tmp,123 --- if match tmp,??? then... (3 Replies)
Discussion started by: arvindng
3 Replies

8. Shell Programming and Scripting

multi line multirecord find and replace

Hello I am looking to have a script that performs some tasks for find and replace and inserts a line as well. I have done some programming 10 years ago, so it is causing me a little grief. File consists of 2500 records. I will show you a sample consisting of two records below and what needs... (3 Replies)
Discussion started by: cdc01
3 Replies

9. Shell Programming and Scripting

Perl: Search for string on line then search and replace text

Hi All, I have a file that I need to be able to find a pattern match on a line, search that line for a text pattern, and replace that text. An example of 4 lines in my file is: 1. MatchText_randomNumberOfText moreData ReplaceMe moreData 2. MatchText_randomNumberOfText moreData moreData... (4 Replies)
Discussion started by: Crypto
4 Replies

10. Shell Programming and Scripting

Search and replace multi-line text in files

Hello I need to search for a mult-line text in a file exfile1 and replace that text with another text. The text to search for is in exfile2 and the replacement text is in exfile3. I work with kornshell under AIX and need to do this with a lot of files. (the file type is postscript and they need... (10 Replies)
Discussion started by: marz
10 Replies
Login or Register to Ask a Question