RudiC, i'm not dropping it, because i need to get other texts out of the html, but for the example sakes, yes that would make it more optimized.
I have 5 more texts that i'm matching and making the output into a csv file.
The html from which i'm parsing is built up very poorly.
Since i need this all in one line or else the csv file will broke (just realized this) had to get rid of the new lines tr -d "\n\r"
I' removing the extra whitespaces at the beginning and end awk '{$1=$1};1'
Also for csv proofing i'm replacing the commas with semicolon because csv will interpret commas as end of column tr ',' ';'
So this makes me wonder if that one sed could do all these by on it's own.
But i'm happy now because this works now.
I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part.
Same problem happens in "type" command in MS-DOS.
I know you can do it by opening it in Internet Explorer,... (4 Replies)
Hai friends
I have a small doubt..
how can we use html tag in shell scripting
code :
echo "<html>"
echo "<body>"
echo " welcome to peace world "
echo "</body>"
echo "</html>"
output displayed like this:
<html>
<body>
welcome to peace world
</body>
</html> (5 Replies)
hi all,
i have a html file something similar to this.
<tr class="evenrow">
<td class="data">added</td><td class="data">xyz@abc.com</td>
<td class="data">filename.sql</td><td class="modifications-data">08/25/2009 07:58:40</td><td class="data">Added TK prof script</td>
</tr>
<tr... (1 Reply)
Hi!
I have a bunch of HTML files, which I want to parse to CSV files. Every page has a table in it, and I need to parse each row into a csv record.
With awk and sed, I managed to put every table row in separate lines. So my file looks like this:
<TR> .... </TR>
<TR> .... </TR>
...One... (1 Reply)
Guys,
I have a little script that I got of the internet and that I use in Squid to block ads.
I used that script with linux but now i have moved my servers to freebsd. I have a step learning curve there but it is fun: Back to the script issue.
The script used to work i with linux but... (15 Replies)
I have an XML tag like this:
<property name="agent" value="/var/tmp/root/eclipse" />
Is there way using awk that i can get the value from the above tag. So the output should be:
/var/tmp/root/eclipse
Help will be appreciated.
Regards,
Adi (6 Replies)
I want to print from <fruits> to </fruits> tag which have <fruit> as mango. Also i want both <fruits> and </fruits> in output. Please help
eg.
<fruits>
<fruit id="111">mango<fruit>
.
another 20 lines
.
</fruits> (3 Replies)
Hi Guys
Here is my Input :
<?xml version="1.0" encoding="UTF-8"?>
<xn:MeContext id="01736">
<xn:VsDataContainer id="01736">
<xn:attributes>
<xn:vsDataType>vsDataMeContext</xn:vsDataType>
... (12 Replies)
I want to clean a html file.
I try to remove the script part in the html and remove the rest of tags and empty lines.
The code I try to use is the following:
sed '/<script/,/<\/script>/d' webpage.html | sed -e 's/<*>//g' | sed '/^\s*$/d' > output.txt
However, in this method, I can not... (10 Replies)
Discussion started by: YuhuiFeng
10 Replies
LEARN ABOUT DEBIAN
xlhtml
xlhtml(1) General Commands Manual xlhtml(1)NAME
xlhtml - A program for converting Microsoft Excel Files .xls
SYNOPSIS
xlhtml [-a] [-asc] [-csv] [-xml] [-bcNNNNNN] [-bi/path] [-c] [-dp] [-v] [-fw] [-m] [-nc] [-nh] [-tcNNNNNN] [-te] [-xc:N-N] [-xp:N] [-xr:N-
N] FILE
DESCRIPTION
This manual page explains the xlhtml program. The program xlhtml is used to convert Microsoft Excel Spreadsheet files into either html or
tab delimitted ASCII. The program can be interfaced with helper scripts for viewing email attachments. Most use of this program is through
the helper scripts and one would probably rarely resort to using the commandline interface.
OPTIONS -a aggressively optimize html by removing </TR> </TD> or VALIGN="bottom". Some older browsers may not display properly in this mode.
-asc Ascii out of -dp and extraction data (-xc, -xp, -xr)
-csv Output in Comma Separated Values of -dp and extraction data (-xc, -xp, -xr)
-xml Output in XML of -dp and extraction data (-xc, -xp, -xr)
-bc Override the background color. e.g. -bc808080 for gray
-bi Use background image. e.g. -bi/home/httpd/icon/tar.gif
-c Centers the tables horizontally
-dp Dump page count and max columns and rows per page
-v Prints program version
-fw suppress formula warnings about accuracy
-m No encoding for multibyte
-nc tells it not to colorize the output.
-nh Suppress header and body tags in html output
-tc Override the text color. e.g. -tcFF0000 for red
-te Trims empty rows & columns at the edges of a worksheet
-xc Columns (separated by a dash) for extraction (zero based)
-xp Page for extraction (zero based), one page only
-xr Rows (separated by a dash) to be extracted (zero based)
An example of the extraction command line is: xlhtml -fw -asc -xp:0 -xr:2-6 -xc:0-1 Test.xls
The extraction output is: Formatted output of cells by column left to right, columns separated by a tab, end of row is: 0x0A, end of file:
AUTHOR
Steve Grubb, Charles N Wyble
xlhtml May 15, 2002 xlhtml(1)