Sponsored Content
Top Forums Shell Programming and Scripting extract data with awk from html files Post 302481313 by sbobotex on Friday 17th of December 2010 06:51:51 AM
Old 12-17-2010
extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter.

my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines
Here's my situation
Code:
 <td align="default"> oxidizability (mg / l):
 data_to_extract 
 </ td>

this structure is repeated in all of these files
how do I use awk to do this extraction and enter the data into a file. txt?
Thank you all

Moderator's Comments:
Mod Comment Use code tags when posting code, data or logs to preserve formatting and enhance readability, thanks

Last edited by zaxxon; 12-17-2010 at 07:53 AM.. Reason: code tags
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

extract data from html tables

hi i need to use unix to extract data from several rows of a table coded in html. I know that rows within a table have the tags <tr> </tr> and so i thought that my first step should be to to delete all of the other html code which is not contained within these tags. i could then use this method... (8 Replies)
Discussion started by: Streetrcr
8 Replies

2. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

I am attempting to extract weather data from the following website, but for the Victoria area only: Text Forecasts - Environment Canada I use this: sed -n "/Greater Victoria./,/Fraser Valley./p" But that phrasing does not sometimes get it all and think perhaps the website has more... (2 Replies)
Discussion started by: lagagnon
2 Replies

3. UNIX for Dummies Questions & Answers

AWK, extract data from multiple files

Hi, I'm using AWK to try to extract data from multiple files (*.txt). The script should look for a flag that occurs at a specific position in each file and it should return the data to the right of that flag. I should end up with one line for each file, each containing 3 columns:... (8 Replies)
Discussion started by: Liverpaul09
8 Replies

4. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Hi, I'd like to process multiple files. For example: file1.txt file2.txt file3.txt Each file contains several lines of data. I want to extract a piece of data and output it to a new file. file1.txt ----> newfile1.txt file2.txt ----> newfile2.txt file3.txt ----> newfile3.txt Here is... (3 Replies)
Discussion started by: Liverpaul09
3 Replies

5. Shell Programming and Scripting

Extract data with awk and write to several files

Hi! I have one file with data that looks like this: 1 data data data data 2 data data data data 3 data data data data . . . 1 data data data data 2 data data data data 3 data data data data . . . I would like to have awk to write each block to a separate file, like this: 1... (3 Replies)
Discussion started by: LinWin
3 Replies

6. Shell Programming and Scripting

extract complex data from html table rows

I have bash, awk, and sed available on my portable device. I need to extract 10 fields from each table row from a web page that looks like this: </tr> <tr> <td>28 Apr</td> <td><a... (6 Replies)
Discussion started by: rickgtx
6 Replies

7. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Hi, I'm trying to get some data from an html file, but the problem is before it can extract the information I have multiple patterns that need to be passed through. https://www.unix.com/shell-programming-scripting/150711-extract-data-awk-html-files.html Is a similar problem. The only... (5 Replies)
Discussion started by: counfhou
5 Replies

8. Shell Programming and Scripting

Awk/sed HTML extract

I'm extracting text between table tags in HTML <th><a href="/wiki/Buick_LeSabre" title="Buick LeSabre">Buick LeSabre</a></th> using this: awk -F "</*th>" '/<\/*th>/ {print $2}' auto2 > auto3 then this (text between a href): sed -e 's/\(<*>\)//g' auto3 > auto4 How to shorten this into one... (8 Replies)
Discussion started by: p1ne
8 Replies

9. Shell Programming and Scripting

Compare 2 files and extract the data which is present in other file - awk is not working

file2 content f1file2 content f1,1,2,3,4,5 f1,2,4,6,8,10 f10,1,2,3,4,5 f10,2,4,6,8,10 f5,1,2,3,4,5 f5,2,4,6,8,10awk 'FNR==NR{a;next}; !($1 in a)' file2 file1output f10,1,2,3,4,5 f10,2,4,6,8,10 f5,1,2,3,4,5 f5,2,4,6,8,10awk 'FNR==NR{a;next}; ($1 in a)' file2 file1output nothing... (4 Replies)
Discussion started by: gksenthilkumar
4 Replies

10. UNIX for Beginners Questions & Answers

awk to extract value after keyword in html

Using awk to extract value after a keyword in an html, and store in ts. The awk does execute but ts is empty. I use the tag as a delimiter and the keyword as a pattern, but there probably is a better way. Thank you :). file <html><head><title>xxxxxx xxxxx</title><style type="text/css"> ... (4 Replies)
Discussion started by: cmccabe
4 Replies
libapache2-mod-perl2-2.0.7::docs::api::Apache2::URI(3pm)User Contributed Perl Documentatiolibapache2-mod-perl2-2.0.7::docs::api::Apache2::URI(3pm)

NAME
Apache2::URI - Perl API for manipulating URIs Synopsis use Apache2::URI (); $hostport = $r->construct_server(); $hostport = $r->construct_server($hostname); $hostport = $r->construct_server($hostname, $port); $hostport = $r->construct_server($hostname, $port, $pool); $url = $r->construct_url(); $url = $r->construct_url($rel_uri); $url = $r->construct_url($rel_uri, $pool); $parsed_uri = $r->parse_uri($uri); $parsed_uri = $r->parsed_uri(); $url = join '%20', qw(one two three); Apache2::URI::unescape_url($url); Description While "APR::URI" provides a generic API to dissect, adjust and put together any given URI string, "Apache2::URI" provides an API specific to Apache, by taking the information directly from the $r object. Therefore when manipulating the URI of the current HTTP request usually methods from both classes are used. API
"Apache2::URI" provides the following functions and methods: "construct_server" Construct a string made of hostname and port $hostport = $r->construct_server(); $hostport = $r->construct_server($hostname); $hostport = $r->construct_server($hostname, $port); $hostport = $r->construct_server($hostname, $port, $pool); obj: $r ( "Apache2::RequestRec object" ) The current request object opt arg1: $hostname ( string ) The hostname of the server. If that argument is not passed, "$r->get_server_name" is used. opt arg2: $port ( string ) The port the server is running on. If that argument is not passed, "$r->get_server_port" is used. opt arg3: $pool ( "APR::Pool object" ) The pool to allocate the string from. If that argument is not passed, "$r->pool" is used. ret: $hostport ( string ) The server's hostport string since: 2.0.00 Examples: o Assuming that: $r->get_server_name == "localhost"; $r->get_server_port == 8001; The code: $hostport = $r->construct_server(); returns a string: localhost:8001 o The following code sets the values explicitly: $hostport = $r->construct_server("my.example.com", 8888); and it returns a string: my.example.com:8888 "construct_url" Build a fully qualified URL from the uri and information in the request rec: $url = $r->construct_url(); $url = $r->construct_url($rel_uri); $url = $r->construct_url($rel_uri, $pool); obj: $r ( "Apache2::RequestRec object" ) The current request object opt arg1: $rel_uri ( string ) The path to the requested file (it may include a concatenation of path, query and fragment components). If that argument is not passed, "$r->uri" is used. opt arg2: $pool ( "APR::Pool object" ) The pool to allocate the URL from If that argument is not passed, "$r->pool" is used. ret: $url ( string ) A fully qualified URL since: 2.0.00 Examples: o Assuming that the request was http://localhost.localdomain:8529/test?args The code: my $url = $r->construct_url; returns the string: http://localhost.localdomain:8529/test notice that the query (args) component is not in the string. You need to append it manually if it's needed. o Assuming that the request was http://localhost.localdomain:8529/test?args The code: my $rel_uri = "/foo/bar?tar"; my $url = $r->construct_url($rel_uri); returns the string: http://localhost.localdomain:8529/foo/bar?tar "parse_uri" Break apart URI (affecting the current request's uri components) $r->parse_uri($uri); obj: $r ( "Apache2::RequestRec object" ) The current request object arg1: $uri ( string ) The uri to break apart ret: no return value warning: This method has several side-effects explained below since: 2.0.00 This method call has the following side-effects: 1. sets "$r->args" to the rest after '?' if such exists in the passed $uri, otherwise sets it to "undef". 2. sets "$r->uri" to the passed $uri without the "$r->args" part. 3. sets "$r->hostname" (if not set already) using the ("scheme://host:port") parts of the passed $uri. "parsed_uri" Get the current request's parsed uri object my $uri = $r->parsed_uri(); obj: $r ( "Apache2::RequestRec object" ) The current request object ret: $uri ( "APR::URI object" ) The parsed uri since: 2.0.00 This object is suitable for using with "APR::URI::rpath" "unescape_url" Unescape URLs Apache2::URI::unescape_url($url); obj: $url ( string ) The URL to unescape ret: no return value The argument $url is now unescaped since: 2.0.00 Example: my $url = join '%20', qw(one two three); Apache2::URI::unescape_url($url); $url now contains the string: "one two three"; See Also "APR::URI", mod_perl 2.0 documentation. Copyright mod_perl 2.0 and its core modules are copyrighted under The Apache Software License, Version 2.0. Authors The mod_perl development team and numerous contributors. perl v5.14.2 2011-02-08 libapache2-mod-perl2-2.0.7::docs::api::Apache2::URI(3pm)
All times are GMT -4. The time now is 03:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy