sed to parse html


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed to parse html
# 1  
Old 10-18-2010
sed to parse html

Hello,
I have a html file like this :

<html>
...
...
...
<table>
.......
......
</table>
<table name = "hi">
......
.....
...
</table>
<h1> Welcome </h1>
.......
......
</html>

I only need to take out the text that is between <table name = "hi" > and the corresponding </table>. I need to delete the rest. How do I do that?

I got to the <table name = "hi"> and I deleted lines before that, but I am not able to get to the corresponding </table> as there could be multiple </table> statements.

Please help.


Thanks,
Prasanna
# 2  
Old 10-18-2010
Try:
Code:
sed '/<table name = "hi">/,/<\/table>/!d' infile

# 3  
Old 10-18-2010
Thanks a lot.

Is the regex not greedy? Would it not match the last </table> that it sees, if there are other table tags below our </table>


Thanks,
Prasanna
# 4  
Old 10-18-2010
Might as well make it tab separated text, too, so you can excel/access import it. Smilie
# 5  
Old 10-18-2010
Hi no it is not greedy...
# 6  
Old 10-18-2010
Thanks a lot.
# 7  
Old 10-18-2010
You can import html but it is amazingly slow!

---------- Post updated at 04:49 PM ---------- Previous update was at 04:48 PM ----------

Beware some old access do not know how to properly honor CSV, so tab sep txt is the winner!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Multiline html tag parse shell script

Hello, I want to parse the contents of a multiline html tag ex: <html> <body> <p>some other text</p> <div> <p class="margin-bottom-0"> text1 <br> text2 <br> <br> text3 </p> </div> </body> (15 Replies)
Discussion started by: SorcRR
15 Replies

2. UNIX for Beginners Questions & Answers

How to parse a specifc value between html tags using sed?

Hi, im trying to read a Temperature value from html code. So far i have managed to reduce the whole html page down to this single line with the following sed command:sed -n '/Temperature/p' $temp_temperature | tee temp_string <TD width='350'>Temperature :</td><td>25... (2 Replies)
Discussion started by: naittis
2 Replies

3. Shell Programming and Scripting

Parse html

I downloaded source code using: wget -qO- http://fulgentdiagnostics.com/test/clinical-exome/ | cat > flugentsource.txt Now I am trying to use sed to parse it to confirm a gene count. Basically, output (flugent.txt) all the gene names with a total count after them I'm not all that... (5 Replies)
Discussion started by: cmccabe
5 Replies

4. Shell Programming and Scripting

Parse multiple html files in directory

I have downloaded source code for 97 files using: wget -x -i link.txt then run a rename loop: for file in * do mv $file $file.txt done to keep the html tags but make the file a text that can be parsed. In each of the 97 txt files the gene # is variable, but the gene is associated... (15 Replies)
Discussion started by: cmccabe
15 Replies

5. Shell Programming and Scripting

awk to parse html file

Is it possible in awk to parse a webpage (EDAR Gene Sequencing - Genetic Testing Company | The DNA Diagnostic Experts | GeneDx), the source code is attached. <title> EDAR Gene Sequencing <dt>Test Code:</dt> <dd>156 </dd> <dt>Turnaround Time:</dt> <dd>6-8 weeks </dd> ... (4 Replies)
Discussion started by: cmccabe
4 Replies

6. Shell Programming and Scripting

Parse excel file with html on each cell

<DIV><P>Pré-condição aceder ao ecrã Home do MRS.</P></DIV><DIV><P>OK.</P></DIV><DIV><P>Seleccionar Pesquisa de Recepção Directa.</P></DIV><DIV><P>Confirmar que abriu ecrã de Recepção Directa.</P></DIV><DIV> (6 Replies)
Discussion started by: oliveiraum
6 Replies

7. Shell Programming and Scripting

Extract/Parse information from html (website)

Hello, I want to extract some informations from a html (website, http://www.energiecontracting.de/7-mitglieder/von-A-Z.php?a_z=B&seite=2 ) file and save those in a predefined format (.csv).. However it seems that the code on that website is kinda messy and I can't find a way to handle it... (5 Replies)
Discussion started by: TehOne
5 Replies

8. Shell Programming and Scripting

Parse HTML tag parameters and text

Hi! I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record. With awk and sed, I managed to put every table row in separate lines. So my file looks like this: <TR> .... </TR> <TR> .... </TR> ...One... (1 Reply)
Discussion started by: senszey
1 Replies

9. UNIX for Advanced & Expert Users

shell script to parse html file

hi all, i have a html file something similar to this. <tr class="evenrow"> <td class="data">added</td><td class="data">xyz@abc.com</td> <td class="data">filename.sql</td><td class="modifications-data">08/25/2009 07:58:40</td><td class="data">Added TK prof script</td> </tr> <tr... (1 Reply)
Discussion started by: sais
1 Replies

10. Shell Programming and Scripting

Parse Line Using Sed

Hello All, I am new to using sed, and I need to extract from the string data after : delimeter. Can you help me please with the sed command? Here's the input: ipAddress: 10.20.10.11 ioIpAddressNodeB: 10.20.10.10 ioIpAddressNodeA: 10.20.10.9 ipAddress: 0.0.0.0 Expected Output:... (7 Replies)
Discussion started by: racbern
7 Replies
Login or Register to Ask a Question