Turn HTML data into delimited text


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Turn HTML data into delimited text
# 1  
Old 11-21-2008
Java Turn HTML data into delimited text

I have a file I've already partially pruned with grep that has data like:

<a href="MasterDetailResults.asp?textfield=a&Application=3D Home Architect 4">3D Home Architect 4</a> </td>
Approved </td>
--
<a href="MasterDetailResults.asp?textfield=a&Application=3d Home Architect 6">3d Home Architect 6</a> </td>
Not Approved </td>
--
<a href="MasterDetailResults.asp?textfield=a&Application=A to Zap">A to Zap</a> </td>
Approved </td>
--

except much, much more of it ;-)

I want to get the application name (i.e. 3D Home Architect 4) and the status (i.e. Approved or Not Approved) and turn it into this:

3D Home Architect 4|Approved
3d Home Architect 6|Not Approved
A to Zap|Approved
etc.

for use as a searchable database or import into Excel

I want to use bash scripting with sed or gawk to do this in the smallest number of lines (number of lines is not critical, of course ;-)

Thanks in advance for your help.
# 2  
Old 11-21-2008
Try this:

Code:
awk -F"\"" '/Application=/{sub(".*a&","",$2);s=$2;getline;FS=" ";$0=$0;print s"|"$1}' file

# 3  
Old 11-21-2008
Thanks Franklin52, that's a start. I got:
Application=3D Home Architect 4|Approved
Application=3d|Not
Application=A|Approved
when I ran it. I'll keep on working on it.
# 4  
Old 11-21-2008
Hi,

try

Code:
sed -n '/Application/{N;s/.*Application=\([^"]*\).*\n\(.*\)<.*/\1 | \2/p}' file

If you sed doesn't support \n you have to write

Code:
sed -n '/Application/{N;s/.*Application=\([^"]*\).*\
\(.*\)<.*/\1 | \2/p}' file

instead.

HTH Chris
# 5  
Old 11-22-2008
Quote:
Originally Posted by macxcool
Thanks Franklin52, that's a start. I got:
Application=3D Home Architect 4|Approved
Application=3d|Not
Application=A|Approved
when I ran it. I'll keep on working on it.
This should work:
Code:
awk -F"\"" '
/Application=/{
  sub(".*=","",$2); s=$2
  getline; sub(" <.*","")
  print s "|" $0
}' file

# 6  
Old 11-23-2008
perl:

Code:
undef $/;
open FH,"<d:/a.txt";
$str=<FH>;
@arr=split("--",$str);
map {s/<a.*>(.*)<\/a>(.*)<\/td>\n(.*)<\/td>/$1|$3/} @arr;
print "@arr";
close FH;

# 7  
Old 11-24-2008
Thank you all for your solutions. I'm going to use Christoph Spohr's because I'm more comfortable with sed than I am with awk (although I know it's very powerful). I get an output with spaces after the pipe because there are spaces at the beginning of the line. How can I modify
Code:
sed -n '/Application/{N;s/.*Application=\([^"]*\).*\n\(.*\)<.*/\1 | \2/p}' file

to get rid of those spaces.
Also, what if my input file has another line between the two lines in question:
Code:
    <tr> 
      <td height="23" align="default" valign="top"> 
        <a href="MasterDetailResults.asp?textfield=a&Application=3D Home Architect 4">3D Home Architect 4</a> </td>
      <td align="default" valign="top"> 
        Approved </td>
    </tr>

Once again, I need: Application Name|Status as my output. I've been removing the
<td align="default" valign="top">
line with sed before finishing things off with the sed code above.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Run sql query in shell script and output data save as delimited text

I want to run sql query in shell script and output data save as delimited text (delimited text would be comma) Code: SPOOL_FILE=/pgedw/dan.txt SQL=/pgedw/dan.sql sqlplus -s username/password@myhost:port/servicename <<EOF set head on set COLSEP , set linesize 32767 SET TRIMSPOOL ON SET... (8 Replies)
Discussion started by: Jaganjag
8 Replies

2. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

3. Shell Programming and Scripting

Parsing HTML, get text between 2 HTML tags

Hi there, I'm quite new to the forum and shell scripting. I want to filter out the "166.0 points". The results, that i found in google / the forum search didn't helped me :( <a href="/user/test" class="headitem menu" style="color:rgb(83,186,224);">test</a><a href="/points" class="headitem... (1 Reply)
Discussion started by: Mysthik
1 Replies

4. Shell Programming and Scripting

Bash shell script that inserts a text data file into an HTML table

hi , i need to create a bash shell script that insert a text data file into an html made table, this table output has to mailed.I am new to shell scripting and have a very minimum idea of shell scripting. please help. (9 Replies)
Discussion started by: intern123
9 Replies

5. Shell Programming and Scripting

using awk to substitute data in a column delimited text file

using awk to substitute data in a column delimited text file hello i would like to use awk to do the following calculation from the following snippet. input file C;2390 ;CV BOUILLOTTE 2L 2FACES NERVUREES ;1.00 ;3552612239004;13417 ;25 ;50 ; 12;50000 ; ; ... (3 Replies)
Discussion started by: iindie
3 Replies

6. Shell Programming and Scripting

Using sed or awk to turn data into html

Hi there, I'm wondering the best way to go about this. I have a file which is fairly specific in its format, but it has some options in it that mess up what I need. I'll give you an example of a couple of lines: Bob D Thomas D/F Alice A/F Michael A/D/F John Michael B Bachman Turner A/D... (7 Replies)
Discussion started by: melancthon
7 Replies

7. Shell Programming and Scripting

Extracting a portion of data from a very large tab delimited text file

Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need... (2 Replies)
Discussion started by: Lucky Ali
2 Replies

8. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies

9. UNIX for Dummies Questions & Answers

How do I extract text only from html file without HTML tag

I have a html file called myfile. If I simply put "cat myfile.html" in UNIX, it shows all the html tags like <a href=r/26><img src="http://www>. But I want to extract only text part. Same problem happens in "type" command in MS-DOS. I know you can do it by opening it in Internet Explorer,... (4 Replies)
Discussion started by: los111
4 Replies

10. Programming

coverting html data to text in 'c'

hi, iam reading the webpage using curl socket. so iam geting the data in html format so how can convert html data to text data ,so i can move forward. thank u, sree (3 Replies)
Discussion started by: phani_sree
3 Replies
Login or Register to Ask a Question