extract data with awk from html files

12-17-2010

Registered User

7, 0

Join Date: Dec 2010

Last Activity: 28 December 2010, 9:57 AM EST

Posts: 7

Thanks Given: 1

Thanked 0 Times in 0 Posts

extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter.

my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines
Here's my situation

Code:

 <td align="default"> oxidizability (mg / l):
 data_to_extract 
 </ td>

this structure is repeated in all of these files
how do I use awk to do this extraction and enter the data into a file. txt?
Thank you all

Moderator's Comments:

Use code tags when posting code, data or logs to preserve formatting and enhance readability, thanks

Last edited by zaxxon; 12-17-2010 at 07:53 AM.. Reason: code tags

sbobotex

View Public Profile for sbobotex

Find all posts by sbobotex

12-17-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Try this:

Code:

awk 'p && /<\/ td>/{p=0}
p
/<td align="default">/{p=1}' htmlfile > file.txt

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

12-17-2010

Registered User

7, 0

Join Date: Dec 2010

Last Activity: 28 December 2010, 9:57 AM EST

Posts: 7

Thanks Given: 1

Thanked 0 Times in 0 Posts

ok thanks for the answer but i need a customization of the command
i have a grooup of html files inside a directory and inside them lies a structure

PHP Code:


<td align="default"> oxidizability (mg / l):
 data_to_extract 
 </td>

"data_to_extract" is the value that changing while

PHP Code:


<td align="default"> oxidizability (mg / l):

and

PHP Code:


</td>

remains the same

so, assuming i have 3 html files, the resultant file.txt should be something like that

PHP Code:


<td align="default"> oxidizability (mg / l):
 34
 </td> <td align="default"> oxidizability (mg / l):
 45 
 </td> <td align="default"> oxidizability (mg / l):
 56
 </td>

i need exaclty do this

sbobotex

View Public Profile for sbobotex

Find all posts by sbobotex

12-17-2010

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

You could try something like:

Code:

awk '
/<td align="default">/{p=1; s=$0}
p && /<\/td>/{print $0 FS s; s=""; p=0}
p' file >> newfile

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

12-17-2010

Registered User

7, 0

Join Date: Dec 2010

Last Activity: 28 December 2010, 9:57 AM EST

Posts: 7

Thanks Given: 1

Thanked 0 Times in 0 Posts

sorry but still don't work . i need to filter exactly

PHP Code:


<td align="default"> oxidizability (mg / l):

not

PHP Code:


<td align="default">

sbobotex

View Public Profile for sbobotex

Find all posts by sbobotex

12-17-2010

Registered User

2,977, 644

Join Date: Oct 2010

Last Activity: 14 September 2019, 1:15 PM EDT

Location: France

Posts: 2,977

Thanks Given: 88

Thanked 644 Times in 613 Posts

Please give a representative sample of input file and expected output file.

ctsgnb

View Public Profile for ctsgnb

Find all posts by ctsgnb

12-20-2010

Registered User

7, 0

Join Date: Dec 2010

Last Activity: 28 December 2010, 9:57 AM EST

Posts: 7

Thanks Given: 1

Thanked 0 Times in 0 Posts

ok i made some editings starting from your example!! Now it Works!! You're was very helpfull thank you very much!!!

sbobotex

View Public Profile for sbobotex

Find all posts by sbobotex

Shell Programming and Scripting

extract data with awk from html files

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk to extract value after keyword in html

Discussion started by: cmccabe

2. Shell Programming and Scripting

Compare 2 files and extract the data which is present in other file - awk is not working

Discussion started by: gksenthilkumar

3. Shell Programming and Scripting

Awk/sed HTML extract

Discussion started by: p1ne

4. Shell Programming and Scripting

awk -- Extract data from html within multiple tags as reference

Discussion started by: counfhou

5. Shell Programming and Scripting

extract complex data from html table rows

Discussion started by: rickgtx

6. Shell Programming and Scripting

Extract data with awk and write to several files

Discussion started by: LinWin

7. UNIX for Dummies Questions & Answers

Using AWK: Extract data from multiple files and output to multiple new files

Discussion started by: Liverpaul09

8. UNIX for Dummies Questions & Answers

AWK, extract data from multiple files

Discussion started by: Liverpaul09

9. Shell Programming and Scripting

SED to extract HTML text data, not quite right!

Discussion started by: lagagnon

10. UNIX for Dummies Questions & Answers

extract data from html tables

Discussion started by: Streetrcr