HTML to XML parser


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting HTML to XML parser
# 1  
Old 11-08-2011
HTML to XML parser

Hello forum, I am having problems to write a bash script.

I am trying to get some information from a web page, I want to format it to XMLTV. This is web page: Programación de las cadenas etb1, etb2, etb3, canal vasco y etb sat | EITB Televisión

I want to get something like this:

Code:
<programme start="20111107131000 +0100">
<title lang="es">Elefanteen egunkariak</title>

From this:

Code:
<p class="hora">13:10</p>
<h2 class="titulo">
Elefanteen egunkariak - <span class="titulo_emision">Ama galtzean hasten da dena</span> <span class="ico"></span>

I try with wget and sed, but I can not get the desired format.

Is there any other tool to do this?

I started with this script:

Code:
#!/bin/bash

url="http://www.eitb.com/es/television/programacion/"
file1="etbsat1.txt"
file2="etbsat2.txt"

rm $file1
rm $file2

wget $url -O $file1
cp $file1 $file2

#Tabs
sed -i 's/
                                        //g' $file2
sed -i 's/                                      //g' $file2

#Lines
sed -i 's/^M$//g' $file2

The problem is that I can not delete the line after "<h2 class="titulo">" line.

Thanks for your help and best regards.
# 2  
Old 11-23-2011
I don't think you are going to get very far with shell scripting. It is far better to use Perl, which has an extensive library on dealing with markup languages like HTML.
Also the line sed -i 's/^M$//g' $file2 suggests that you are looking at a Windows generated file. You can use fromdos to achieve the same.
Login or Register to Ask a Question

Previous Thread | Next Thread

5 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help required in Building an XML using SAX Parser in perl

I want to use sax parser for my application as i have 5 Lakhs of data. I have the xml file like this <Nodes> <Node> <NodeName>Company</NodeName> <File>employee_details.csv</File> <data>employee_data.txt</data> <Node> <NodeName>dummy</NodeName> ... (8 Replies)
Discussion started by: vanitham
8 Replies

2. UNIX for Advanced & Expert Users

XML parser to generate Tuxedo UD files

Hi, My requirement is like this. I have an XML file which needs to be converted to Tuxedo UD files(param name and param value). Does anybody have a sample perl xml parser script for this? (0 Replies)
Discussion started by: guruprasadpr
0 Replies

3. Shell Programming and Scripting

xml-parser with perl

Hello I want to write an xml- parser with perl an i use the libary XML::LibXML. I have a problem with the command getElementsByTagName. If there is an empty tag, the getElementsByTagName method returns a NodeList of length zero. how can i check if this is a nodelist of lenght zero?? i... (1 Reply)
Discussion started by: trek
1 Replies

4. Shell Programming and Scripting

Perl XML:Parser help

I am very new to XML. Really I have an excel file that I am trying to read w/ Perl on a Linux machine. I don't have a mod for reading excel files so I have to convert the excel file to xml to be able to read it. I can read the file and everything is ok except...the Number style is being dropped... (0 Replies)
Discussion started by: vincaStar
0 Replies

5. Shell Programming and Scripting

xml parser in perl

hi all i want to read xml file in perl i am using XML::Simple for this. i am not getting how to read following file removing xml file due to some reason (1 Reply)
Discussion started by: zedex
1 Replies
Login or Register to Ask a Question