awk -- Extract data from html within multiple tags as reference


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk -- Extract data from html within multiple tags as reference
# 1  
Old 03-13-2013
awk -- Extract data from html within multiple tags as reference

Hi, I'm trying to get some data from an html file, but the problem is before it can extract the information I have multiple patterns that need to be passed through.

https://www.unix.com/shell-programmin...tml-files.html

Is a similar problem. The only difference is I have to add more tags,so within the <td> tag then I first need to find a <p> tag etc. I googled a bit around but nowhere I found an example with multiple patterns. Maybe that's not the road to go by?
Anyway if anyone could tell me whether its possible to expand those ranges to multiple ones I would be very grateful.
# 2  
Old 03-13-2013
Can you post few lines from your HTML file and desired output in code tags?
# 3  
Old 03-13-2013
Code:
<div id="bodyContent">
--other html stuff
<p>
--wanted data--
</p>
</div>

I hope this helps to explain the problem.
# 4  
Old 03-13-2013
try this..

Code:
$ cat a.txt
<div id="bodyContent">
--other html stuff
<p>
--wanted data--
</p>
</div>
<div id="bodyContent">
--other html stuff
<p>
--test message--
--wanted data--
</p>
</div>

$ awk -F"<p>|</p>" '{for(i=2;i<=NF;i+=2){print $i}}' RS="" a.txt

--wanted data--


--test message--
--wanted data--

This User Gave Thanks to itkamaraj For This Post:
# 5  
Old 03-13-2013
Code:
awk -F'[<>]' ' {
        for ( i = 1; i <= NF; i++ )
        {
                if ( $i == "p" )
                        f = 1
                if ( $i == "/p" )
                        f = 0
                if ( f && $i != "p" )
                        print $i
        }
} ' filename

This User Gave Thanks to Yoda For This Post:
# 6  
Old 03-13-2013
Thanks guys, this will definitely help me creating the working script Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to extract value after keyword in html

Using awk to extract value after a keyword in an html, and store in ts. The awk does execute but ts is empty. I use the tag as a delimiter and the keyword as a pattern, but there probably is a better way. Thank you :). file <html><head><title>xxxxxx xxxxx</title><style type="text/css"> ... (4 Replies)
Discussion started by: cmccabe
4 Replies

2. Shell Programming and Scripting

Extract data using a reference

Gents, If there the possibility can to extract data using a reference from other file. input.txt ( big file which contends all data output.txt ( data extracted ) selection.txt ( information to extract the data Example In file input.txt there is big data each record have 56 lines like... (3 Replies)
Discussion started by: jiam912
3 Replies

3. Shell Programming and Scripting

Awk/sed HTML extract

I'm extracting text between table tags in HTML <th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th> using this: awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3 then this (text between a href): sed -e 's/\(<*>\)//g' auto3 > auto4 How to shorten this into one... (8 Replies)
Discussion started by: p1ne
8 Replies

4. Shell Programming and Scripting

extract complex data from html table rows

I have bash, awk, and sed available on my portable device. I need to extract 10 fields from each table row from a web page that looks like this: </tr> <tr> <td>28 Apr</td> <td><a... (6 Replies)
Discussion started by: rickgtx
6 Replies

5. Shell Programming and Scripting

extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter. my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines Here's my situation <td align="default"> oxidizability (mg / l): data_to_extract... (6 Replies)
Discussion started by: sbobotex
6 Replies

6. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

7. UNIX for Dummies Questions & Answers

AWK, extract data from multiple files

Hi, I'm using AWK to try to extract data from multiple files (*.txt). The script should look for a flag that occurs at a specific position in each file and it should return the data to the right of that flag. I should end up with one line for each file, each containing 3 columns:... (8 Replies)
Discussion started by: Liverpaul09
8 Replies

8. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies

9. Shell Programming and Scripting

How to extract data from BNC xml with reference brackets?

I have data like the following pattern: <change date="2000-01-09" who="#OUCS">Updated all catrefs</change> <change date="2000-01-08" who="#OUCS">Manually updated tagcounts, titlestmt, and title in source</change> <change date="1999-09-13" who="#UCREL">POS codes revised for BNC-2; header... (14 Replies)
Discussion started by: Johnivy
14 Replies

10. UNIX for Dummies Questions & Answers

extract data from html tables

hi i need to use unix to extract data from several rows of a table coded in html. I know that rows within a table have the tags <tr> </tr> and so i thought that my first step should be to to delete all of the other html code which is not contained within these tags. i could then use this method... (8 Replies)
Discussion started by: Streetrcr
8 Replies
Login or Register to Ask a Question