Html parsing - get line after specific string till a point


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Html parsing - get line after specific string till a point
# 1  
Old 06-11-2011
Tools Html parsing - get line after specific string till a point

Hi all Smilie
It sounds complex, for example
I want to find the whole html file (there are 5 entries of this string and I need to get all of them) for the string
"<td class="contentheading" width="100%">", get the next line from it only till the point that says "</td>", plus removing \t (tabs)

Thanks for any help, I think this sounds challenging, isn't it? Smilie
# 2  
Old 06-11-2011
Post sample input and desired output please.
# 3  
Old 06-11-2011
Example input:
Code:
<p class="day">11</p><p class="year">2011</p></td><td class="spacer_right" style="width:100%;">
<table class="contentpaneopen">
<tr>
        <td class="contentheading" width="100%">
                    Hello this is what I want            </td>
    
    
        <td align="right" width="100%" class="buttonheading">

Desired output:
Code:
Hello this is what I want

Plus, keep in mind that there might be more than one matches for the string "<td class="contentheading" width="100%">" and I need to get all of them
# 4  
Old 06-11-2011
Try:
Code:
perl -ln0e 'while(/<td class="contentheading" width="100%">\n(.*)/g){$x=$1;$x=~s/\t//g;$x=~s/<\/td>//;print $x}' file


Last edited by bartus11; 06-11-2011 at 05:20 PM.. Reason: Updated to search for multiple occurences...
This User Gave Thanks to bartus11 For This Post:
# 5  
Old 06-11-2011
Thanks Smilie Smilie it works perfect, but it just searches for the first match, I mentioned that more than one matches can exist !
For example, the file could be
Code:
<p class="day">11</p><p class="year">2011</p></td><td class="spacer_right" style="width:100%;"> <table 
class="contentpaneopen"> <tr>         <td class="contentheading" width="100%">                     
             Hello this is what I want           </td>                   <td align="right" width="100%" class="buttonheading">
<p class="day">11</p><p class="year">2011</p></td><td class="spacer_right" style="width:100%;"> <table 
class="contentpaneopen"> <tr>         <td class="contentheading" width="100%">                     
               I also need this!!!            </td>           
        <td align="right" width="100%" class="buttonheading">

And the desired output would be:
Code:
Hello this is what I want
I also need this!!!

while your solution outputs only
Code:
Hello this is what I want

# 6  
Old 06-11-2011
Check out the updated code Smilie
This User Gave Thanks to bartus11 For This Post:
# 7  
Old 06-11-2011
Exactly!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract specific line in an html file starting and ending with specific pattern to a text file

Hi This is my first post and I'm just a beginner. So please be nice to me. I have a couple of html files where a pattern beginning with "http://www.site.com" and ending with "/resource.dat" is present on every 241st line. How do I extract this to a new text file? I have tried sed -n 241,241p... (13 Replies)
Discussion started by: dejavo
13 Replies

2. Shell Programming and Scripting

Specific string parsing in Linux/UNIX

Hi, I have a string which can be completely unstructred. I am looking to parse out values within that String. Here is an example <Random Strings> String1=<some number a> String2=<some number b> String3=<some number c> Satish=<some number d> String4=<some number e> I only want to parse out... (1 Reply)
Discussion started by: satishrao
1 Replies

3. Shell Programming and Scripting

Bash take word after specific point and till next space?

Hello, I have an output like Interface Chipset Driver wlan0 Intel 4965/5xxx iwlagn - and I want to take only the 'wlan0' string. This can be done by a="Interface Chipset Driver wlan0 Intel 4965/5xxx iwlagn - " b=${a:25:6} echo $bThe thing is that wlan0 can be something else, like eth0 or... (2 Replies)
Discussion started by: hakermania
2 Replies

4. Shell Programming and Scripting

parsing filename and grabbing specific string patterns

Hi guys...Wow I just composed a huge post and it got erased as I was logged out automatically Anyways I hope someone can help me out here. So the task I'm working on is like this I have a bunch of files that I care about sitting in a directory say $HOME/files Now my job is to go and loop... (6 Replies)
Discussion started by: rukasetsuna
6 Replies

5. Shell Programming and Scripting

running a script only till a point in a day

how can i run the script if its less than a particular time only in unix. for e.g the script kicks off at 9AM and looks for some file etc. I want to make sure it runs only till 12PM and then succeed the job and proceed regardless if the file exists or not. how can we do this (1 Reply)
Discussion started by: dsravan
1 Replies

6. Shell Programming and Scripting

[Solved] Read a line from one string till to another.... Unix scripting..

So i have a file which contains paths to JPG images separated by a space. I have to separate them each path to another file. So, I have to search all strings that start from /home/ and ends with .jpg or .png Then write each one to another file... Can you please help me on doing this???:cool: (11 Replies)
Discussion started by: hakermania
11 Replies

7. Shell Programming and Scripting

using sed to replace a specific string on a specific line number using variables

using sed to replace a specific string on a specific line number using variables this is where i am at grep -v WARNING output | grep -v spawn | grep -v Passphrase | grep -v Authentication | grep -v '/sbin/tfadmin netguard -C'| grep -v 'NETWORK>' >> output.clean grep -n Destination... (2 Replies)
Discussion started by: todd.cutting
2 Replies

8. Shell Programming and Scripting

delete strings till specific string

Hello i want to know a way so i can delete all the strings in file from the begning till a specific string (1 Reply)
Discussion started by: modcan
1 Replies

9. Shell Programming and Scripting

Parsing string using specific delimiter

Hi, I'm wondering what is the best way to parse out a long string that has a specific deliminator and outputting each token between the delim on a newline? i.e. input text1,text2,text3,tex4 i.e. output text1 text2 text3 text4 (8 Replies)
Discussion started by: primp
8 Replies

10. UNIX for Dummies Questions & Answers

Delete line till certain point

Hi, I have a requirement to delete a line till a certain word. Am not sure how to do it e.g I want to delete till the bold character since start of line. Any help is higly appretiated. (2 Replies)
Discussion started by: inq
2 Replies
Login or Register to Ask a Question