html-to-csv


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting html-to-csv
# 1  
Old 04-24-2012
html-to-csv

Dear,

I have to format an output that is html with the tags outside the standard for a csv file.

follows the input file:

Code:
<table id=tabela BORDER=1 CELLSPACING=0 CELLPADDING=0 slcolor=#ffffcc dragcolor='gray' img='false' col='1' rowTotal='1' height=100% habilita_primeira='1' rowHead='0' align=center style='behavior:url(/weblince/js/tableact.htc);'>
<tr>
<TH WIDTH="45" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;COS&nbsp;</font> </TH>
<TH WIDTH="65" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">Localidade</font> </TH>
<TH WIDTH="52" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Agrup.&nbsp;</font> </TH>
<TH WIDTH="92" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Executor&nbsp;</font> </TH>
<TH WIDTH="72" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Num&nbspBA&nbsp;</font> </TH>
<TH WIDTH="72" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;&Aacute;rea&nbsp;T&eacute;c.&nbsp;</font> </TH>
<TH WIDTH="72" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Ocorrência&nbsp;</font> </TH>
<TH WIDTH="112" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;&nbsp;Promessa&nbsp;&nbsp;</font> </TH>
<TH WIDTH="204" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Tipo&nbsp;Produto&nbsp;</font> </TH>
<TH WIDTH="72" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Velocidade&nbsp;CCTO&nbsp;</font> </TH>
<TH WIDTH="82" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Atividade&nbsp;</font> </TH>
</tr></thead><tbody>
<TD ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;ACEA1&nbsp;</font></TD>
<TD ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;ACLD&nbsp;</font></TD>
<TD WIDTH="56" ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;DAD&nbsp;</font></TD>
<TD WIDTH="96" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;TR115654&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr><a href=# onclick=popUp('/cgi-bin/DetalhesBAs.pl?UsErLoGiN=TR076260&bas=139206614&tipo=todos','detalhesControleServicos')>&nbsp;139206614&nbsp;</a></font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;CDE&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;DFIL&nbsp;</font></TD>
<TD WIDTH="116" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;23/04/2012 21:00&nbsp;</font></TD>
<TD WIDTH="208" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;&nbsp;</font></TD>
<TD WIDTH="86" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;REPLD&nbsp;</font></TD>
</tr> <tr>
<TD WIDTH="26" ALIGN="center" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;AC&nbsp;</font></TD>
<TD WIDTH="56" ALIGN="center" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;EBLI&nbsp;</font></TD>
<TD WIDTH="66" ALIGN="center" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;Cel. 6884080416&nbsp;</font></TD>
<TD WIDTH="86" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;-&nbsp;</font></TD>
<TD WIDTH="66" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;P&nbsp;</font></TD>
<TD ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;A&nbsp;</font></TD>
<TD WIDTH="46" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;LPU&nbsp;</font></TD>
<TD WIDTH="116" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;13/04/2012 17:35&nbsp;</font></TD>
<TD WIDTH="288" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;&nbsp;</font></TD>
<TD WIDTH="96" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp; &nbsp;</font></TD>
<TD ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;9AC&nbsp;</font></TD>
<TD ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;ACEA1&nbsp;</font></TD>
<TD ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;BLI&nbsp;</font></TD>
<TD WIDTH="56" ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;DAD&nbsp;</font></TD>
<TD WIDTH="96" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;TR086022&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr><a href=# onclick=popUp('/cgi-bin/DetalhesBAs.pl?UsErLoGiN=TR076260&bas=138172042&tipo=todos','detalhesControleServicos')>&nbsp;138172042&nbsp;</a></font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;CDE&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;P&nbsp;</font></TD>
<TD WIDTH="116" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;12/05/2012 17:00&nbsp;</font></TD>
<TD WIDTH="208" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;&nbsp;</font></TD>
<TD WIDTH="86" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;REPLD&nbsp;</font></TD>
</tr> <tr>
<TD WIDTH="26" ALIGN="center" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;AC&nbsp;</font></TD>
<TD WIDTH="56" ALIGN="center" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;EBLI&nbsp;</font></TD>
<TD WIDTH="66" ALIGN="center" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;Cel. 6884080416&nbsp;</font></TD>
<TD WIDTH="86" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;-&nbsp;</font></TD>
<TD WIDTH="66" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;P&nbsp;</font></TD>
<TD ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;A&nbsp;</font></TD>
<TD WIDTH="46" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;LPU&nbsp;</font></TD>
<TD WIDTH="116" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;02/04/2012 14:00&nbsp;</font></TD>
<TD WIDTH="288" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;&nbsp;</font></TD>
<TD WIDTH="96" ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp; &nbsp;</font></TD>
<TD ALIGN="CENTER" CLASS="tab_linha2"><font class="font_tab_linha"><nobr>&nbsp;9AC&nbsp;</font></TD>
<TD ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;ACEA1&nbsp;</font></TD>
<TD ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;FJO&nbsp;</font></TD>
<TD WIDTH="56" ALIGN="center" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;DAD&nbsp;</font></TD>
<TD WIDTH="96" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;TR086022&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr><a href=# onclick=popUp('/cgi-bin/DetalhesBAs.pl?UsErLoGiN=TR076260&bas=138800021&tipo=todos','detalhesControleServicos')>&nbsp;138800021&nbsp;</a></font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;CDE&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;P&nbsp;</font></TD>
<TD WIDTH="116" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;26/04/2012 17:00&nbsp;</font></TD>
<TD WIDTH="208" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;&nbsp;</font></TD>
<TD WIDTH="76" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;&nbsp;</font></TD>
<TD WIDTH="86" ALIGN="CENTER" CLASS="tab_linha1"><font class="font_tab_linha"><nobr>&nbsp;REPLD&nbsp;</font></TD>
</tr>
</table>
</tr>
&nbsp;&nbsp<LI><font class="font_tab_linha">encontra todos os executores que atuaram nos bilhetes selecionados</font>
</TD></TR></TABLE>
<body>
<INPUT TYPE = 'hidden' NAME= 'cod_area_tec' VALUE=CDE>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=DFEB1>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=GOEA1>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=GOEM1>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=GOET1>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=MSEB1>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=MTEA1>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=GOGNA>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=GOSAT>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=MSCPE>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=MSI01>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=DFSAT>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=ACRBO>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=MTCBA>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=DFI01>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=MTI01>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=ACI01>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=ROD01>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=ACERR>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=ROERR>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=MTERR>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=ROSAT>
<INPUT TYPE = 'hidden' NAME= 'cod_cos' VALUE=ALCA3>
<input type=hidden name=userLogin value='TR076260'>
<input type=hidden name=preventiva value='P'>
<input type=hidden name=cr value=''>
<input type=hidden name=nivel value='3'>
<input type=hidden name=uf_e value=''>
<input type=hidden name=estacao_e value=''>
<input type=hidden name=cod_ramif_area_tecnica_p value=''>
 
</html>

Moderator's Comments:
Mod Comment Code tags for code and data samples, please.

Last edited by Corona688; 04-24-2012 at 03:54 PM..
# 2  
Old 04-24-2012
And what output data would you want for this input? Which bits should be selected for CSV columns?
# 3  
Old 04-24-2012
friends,
sorry I missed inform the data output.

I need a file with data from the first table, ie, the tags would <th> the header and values ​​are among <font> </ font>. example:

HTML Code:
 <TH WIDTH="45" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;COS&nbsp;</font> </TH>
 <TH WIDTH="65" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">Localidade</font> </TH>
 <TH WIDTH="52" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Agrup.&nbsp;</font> </TH>
 <TH WIDTH="92" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Executor&nbsp;</font> </TH>
 <TH WIDTH="72" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Num&nbspBA&nbsp;</font> </TH>
 <TH WIDTH="72" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;&Aacute;rea&nbsp;T&eacute;c.&nbsp;</font> </TH>
 <TH WIDTH="72" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Ocorrência&nbsp;</font> </TH>
 <TH WIDTH="112" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;&nbsp;Promessa&nbsp;&nbsp;</font> </TH>
 <TH WIDTH="204" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Tipo&nbsp;Produto&nbsp;</font> </TH>
 <TH WIDTH="72" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Velocidade&nbsp;CCTO&nbsp;</font> </TH>
 <TH WIDTH="82" ALIGN="CENTER" CLASS="tab_th2"> <font class="font_titulo3">&nbsp;Atividade&nbsp;</font> </TH>
out put: 
COS;Localidade;Agrup;Executor;Num BA;Area Tec;Ocorrência;Promessa;Tipo Produto;Velocidade CCTO;Atividade
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Converting csv to html format

Below is the code I have - How can I convert the data in the csv into 3 tables in html. instead of 1 table. Attached is the format I am getting. (1 Reply)
Discussion started by: archana25
1 Replies

2. UNIX for Dummies Questions & Answers

HTML to CSV

Hi, I Have one webpage with tables and I Want to save it to csv. If I open it in Calc and export it to CSV the file its right separated how can I make the same with awk? Im attatching the webpage to convert it in CSV. (1 Reply)
Discussion started by: faka
1 Replies

3. Shell Programming and Scripting

Help needed in csv to html

Hi, Below is the code i have. But it prints entire csv line in one column. I want to print 10 comma-separated fields in 10 columns. Almost there, maybe a tweak you guys can help with. cat reports/file.csv |awk -v border=1 -v width=10 -v bgcolor=black -v f gcolor=white ' BEGIN {... (1 Reply)
Discussion started by: jakSun8
1 Replies

4. Shell Programming and Scripting

html to csv conversion

thanks for allowing me to join your forum i have a html file with three columns ------------Last visit date , URL and link,,,,,,,, how can i convert the same into csv so that i can output into database... the mechine is linux...i made a little googling and got idea that there is ways for... (2 Replies)
Discussion started by: certteam
2 Replies

5. UNIX for Dummies Questions & Answers

convert csv to html file

Hi All, I am new to this forum,not sure where to post this query...so posted here Kindly need any of your help on the below ------------ I am using shell scripting and trying to convert a csv file to html file... example.csv --------------- Name Country Age Sex Andy India 25 ... (4 Replies)
Discussion started by: sumithra
4 Replies

6. Shell Programming and Scripting

Parsing: How to go from HTML to CSV?

Dear all, I have to parse a large amount of html files, which I would like to transform into comma separated values. The html-files have the following structure: <tag1> CATEGORY_1 <tag2><tag3> HEADER_1 <tag4> <tag5> paragraph_1 <tag6> <tag5> paragraph_2 <tag6> <tag3>HEADER_2... (2 Replies)
Discussion started by: docdudetheman
2 Replies

7. Shell Programming and Scripting

HTML to csv

Hi !! Could you please let me know of how can a html file be converted to csv.. I am looking out for a script which could do that.. Please find the below example <HTML><BODY><TABLE> <TR><TD>Parent CR</TD><TD>ChildCR</TD><TD>Title</TD><TD>Description</TD></TR> </TABLE></BODY></HTML>... (3 Replies)
Discussion started by: ganga.dharan
3 Replies

8. Shell Programming and Scripting

HTML table to CSV

Hi !! I have HTML Tables through which i want to generate graphs, but for creating graphs i need the file in CSV format so can anyone can please help me in how can i convert my HTML table file to CSV format. Thanks in Advance (2 Replies)
Discussion started by: i_priyank
2 Replies

9. UNIX for Dummies Questions & Answers

Converting HTML to CSV

Hi, I need to convert a relatively large html file (1.5megs) into CSV under Unix. How would I be able to do this? Much thanks. (3 Replies)
Discussion started by: Jexel
3 Replies
Login or Register to Ask a Question