sed to parse html


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sed to parse html
# 8  
Old 10-18-2010
@Scrutinizer

What happend if

Code:
<table name = "hi">
......
<table name = "hi">
...
</table>
.....
...
</table>
.....

# 9  
Old 10-18-2010
It would output:
Code:
<table name = "hi">
......
<table name = "hi">
...
</table>

# 10  
Old 10-18-2010
so you would miss Smilie
Code:
.....
...
</table>

# 11  
Old 10-18-2010
Code:
#!/usr/bin/env ruby  -Ku
file=ARGV[0]
require 'hpricot'
doc = open(file){|f|Hpricot(f)}
(doc/"table").each do |x|
  print "->#{x}\n" if x.get_attribute("name") == "hi"
end

Code:
# cat file
<html>
...
...
...
<table>
.......
......
</table>
<table name = "hi">
text inside hi
</table>
<h1> Welcome </h1>
.......
......
</html>
<table name = "hi">
some more text inside hi
</table>


$ ruby test.rb file
====> <table name="hi">
text inside hi
</table>
====> <table name="hi">
some more text inside hi
</table>

# 12  
Old 10-19-2010
awk and sed's range patterns also print multiple ranges:
Code:
$ awk '/<table name = "hi">/,/<\/table>/' infile
<table name = "hi">
text inside hi
</table>
<table name = "hi">
some more text inside hi
</table>

Code:
$ sed -n  '/<table name = "hi">/,/<\/table>/p' infile
<table name = "hi">
text inside hi
</table>
<table name = "hi">
some more text inside hi
</table>


Last edited by Scrutinizer; 10-19-2010 at 03:43 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Multiline html tag parse shell script

Hello, I want to parse the contents of a multiline html tag ex: <html> <body> <p>some other text</p> <div> <p class="margin-bottom-0"> text1 <br> text2 <br> <br> text3 </p> </div> </body> (15 Replies)
Discussion started by: SorcRR
15 Replies

2. UNIX for Beginners Questions & Answers

How to parse a specifc value between html tags using sed?

Hi, im trying to read a Temperature value from html code. So far i have managed to reduce the whole html page down to this single line with the following sed command:sed -n '/Temperature/p' $temp_temperature | tee temp_string <TD width='350'>Temperature :</td><td>25... (2 Replies)
Discussion started by: naittis
2 Replies

3. Shell Programming and Scripting

Parse html

I downloaded source code using: wget -qO- http://fulgentdiagnostics.com/test/clinical-exome/ | cat > flugentsource.txt Now I am trying to use sed to parse it to confirm a gene count. Basically, output (flugent.txt) all the gene names with a total count after them I'm not all that... (5 Replies)
Discussion started by: cmccabe
5 Replies

4. Shell Programming and Scripting

Parse multiple html files in directory

I have downloaded source code for 97 files using: wget -x -i link.txt then run a rename loop: for file in * do mv $file $file.txt done to keep the html tags but make the file a text that can be parsed. In each of the 97 txt files the gene # is variable, but the gene is associated... (15 Replies)
Discussion started by: cmccabe
15 Replies

5. Shell Programming and Scripting

awk to parse html file

Is it possible in awk to parse a webpage (EDAR Gene Sequencing - Genetic Testing Company | The DNA Diagnostic Experts | GeneDx), the source code is attached. <title> EDAR Gene Sequencing <dt>Test Code:</dt> <dd>156 </dd> <dt>Turnaround Time:</dt> <dd>6-8 weeks </dd> ... (4 Replies)
Discussion started by: cmccabe
4 Replies

6. Shell Programming and Scripting

Parse excel file with html on each cell

<DIV><P>Pré-condição aceder ao ecrã Home do MRS.</P></DIV><DIV><P>OK.</P></DIV><DIV><P>Seleccionar Pesquisa de Recepção Directa.</P></DIV><DIV><P>Confirmar que abriu ecrã de Recepção Directa.</P></DIV><DIV> (6 Replies)
Discussion started by: oliveiraum
6 Replies

7. Shell Programming and Scripting

Extract/Parse information from html (website)

Hello, I want to extract some informations from a html (website, http://www.energiecontracting.de/7-mitglieder/von-A-Z.php?a_z=B&seite=2 ) file and save those in a predefined format (.csv).. However it seems that the code on that website is kinda messy and I can't find a way to handle it... (5 Replies)
Discussion started by: TehOne
5 Replies

8. Shell Programming and Scripting

Parse HTML tag parameters and text

Hi! I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record. With awk and sed, I managed to put every table row in separate lines. So my file looks like this: <TR> .... </TR> <TR> .... </TR> ...One... (1 Reply)
Discussion started by: senszey
1 Replies

9. UNIX for Advanced & Expert Users

shell script to parse html file

hi all, i have a html file something similar to this. <tr class="evenrow"> <td class="data">added</td><td class="data">xyz@abc.com</td> <td class="data">filename.sql</td><td class="modifications-data">08/25/2009 07:58:40</td><td class="data">Added TK prof script</td> </tr> <tr... (1 Reply)
Discussion started by: sais
1 Replies

10. Shell Programming and Scripting

Parse Line Using Sed

Hello All, I am new to using sed, and I need to extract from the string data after : delimeter. Can you help me please with the sed command? Here's the input: ipAddress: 10.20.10.11 ioIpAddressNodeB: 10.20.10.10 ioIpAddressNodeA: 10.20.10.9 ipAddress: 0.0.0.0 Expected Output:... (7 Replies)
Discussion started by: racbern
7 Replies
Login or Register to Ask a Question