Advanced sed/awk help


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Advanced sed/awk help
# 1  
Old 04-01-2011
Advanced sed/awk help

I have thousands of files in HTML that looks like this:
HTML Code:
....
....
....
    <!-- table horaire -->             <!-- table horaire -->
        <table border="0" cellspacing="0" cellpadding="0" class="tblHoraires" summary="Table des horaires de la ligne 12">
<tr>
<th scope="row" class="horaireColFill_even">05h</th>
<td class="horaireColFill_even">22</td>
<td class="horaireColFill_even">38</td>
<td class="horaireColFill_even">52</td>
<td class="horaireColEmpty_even">&nbsp;</td><td class="horaireColEmpty_even">&nbsp;</td><td class="horaireColEmpty_even">&nbsp;</td><td class="horaireColEmpty_even">&nbsp;</td></tr>
<tr>
<th scope="row" class="horaireColFill_odd">06h</th>

<td class="horaireColFill_odd">06</td>
<td class="horaireColFill_odd">19</td>
<td class="horaireColFill_odd">32</td>
<td class="horaireColFill_odd">44</td>
<td class="horaireColFill_odd">55</td>
<td class="horaireColEmpty_odd">&nbsp;</td><td class="horaireColEmpty_odd">&nbsp;</td></tr>
<tr>
<th scope="row" class="horaireColFill_even">07h</th>
<td class="horaireColFill_even">06</td>
<td class="horaireColFill_even">16</td>

<td class="horaireColFill_even">26</td>
<td class="horaireColFill_even">36</td>
<td class="horaireColFill_even">47</td>
<td class="horaireColFill_even">58</td>
<td class="horaireColEmpty_even">&nbsp;</td></tr>
<tr>
</table>
.....
.....
.....
I would like to extract data from all of them look like the following:

HTML Code:
Filename1#05h22#05h38#05h52#06h06#06h19#...etc....#00h49
Filename2#05h20#05h48#05h55#06h16#06h39#...etc....#00h19
etc
etc
Where the numbers are the text from <th> appended to each corresponding <td>
Would that be possible using sed/awk?
Thanks.
# 2  
Old 04-01-2011
Where is "filename1", "filename2", etc. located in the input file?
# 3  
Old 04-01-2011
something along these lines:
nawk -f char.awk myFiles*
char.awk:
Code:
BEGIN {
# field separator: either > or <
  FS="[<>]"
}
# first line in a current file? print the FILENAME of a current file
FNR==1 {printf("%c%s", (NR==1)?"":ORS, FILENAME)}

#second field contains "th scope=" pattern? save the value of the third field in a var "h"
$2 ~ "th scope=" { h=$3;next}

# second field contains "td class=.*ColFill.*" pattern?Print "#", followed by var "h", followed by value of the third field.
$2 ~ "td class=.*ColFill.*" { printf "#" h $3}

END {

# print ORS/endOfLine for the last printf
  printf ORS
}


Last edited by vgersh99; 04-01-2011 at 04:29 PM..
This User Gave Thanks to vgersh99 For This Post:
# 4  
Old 04-01-2011
How about perl ?
parsehtml.pl
Code:
#!/usr/bin/perl
while(<@ARGV>){
chomp;
printf "$_";
open(FH,"$_") || die "FAIL - $!\n";
while(<FH>){
if(/^<th.*>(.+?)<\/th>$/){$th=$1;}
if(/^<td.*>(.+?)<\/td>$/){printf "#%s",$th.$1;}
}
printf "\n";
close(FH);
}

Invocation
Code:
perl parsehtml.pl myfiles_*.html

# 5  
Old 04-01-2011
Thank you for your support.

@vgersh99:
The awk script works perfectly fine, however it does not display the filename.
Will you kindly comment the code or explain what it does in details so i can make a few more changes to it?

@pravin27:
The perl script does not return anything, it gives a blank line, i believe it would probably be a simple adjustment but i am not familiar with perl at all so i am not how to do it.
# 6  
Old 04-01-2011
Commented the code.
How do you call the script? As suggested?
What OS are you on? If on Solaris, use nawk or /usr/xpg4/bin/awk (instead of old/plain/broken awk).
# 7  
Old 04-01-2011
Thank you for the comments.
I am actually using Linux, am on Kubuntu. I have awk installed, i will try to get nawk and try with it. am executing from bash.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Text manipulation with sed - Advanced technic

Hello everybody, I have the following input file: START ANALYSIS 1 DATA LINE DATA LINE DATA LINE DATA LINE Libray /home/me/myLibrary Source library_name_AAAAA DATA LINE DATA LINE DATA LINE BEGIN SOURCE ANALYSIS Function A Function B Function C Function D (4 Replies)
Discussion started by: namnetes
4 Replies

2. Shell Programming and Scripting

sed and awk giving error ./sample.sh: line 13: sed: command not found

Hi, I am running a script sample.sh in bash environment .In the script i am using sed and awk commands which when executed individually from terminal they are getting executed normally but when i give these sed and awk commands in the script it is giving the below errors :- ./sample.sh: line... (12 Replies)
Discussion started by: satishmallidi
12 Replies

3. Shell Programming and Scripting

EXPECT - advanced help

Hi, i need to automate installation (console) of product and found except as solution.iam new to expect and know basics of expect. i am struck with the following cases and need help for them to continue: 1) in every screen of installation at the end we have kind of buttons which we need to... (7 Replies)
Discussion started by: sai Harika
7 Replies

4. Shell Programming and Scripting

Advanced AWK Regexp substring to int & Replace

Hi! I have a difficult problem, to step up a unknown version number in a text file, and save the file. It would be great to run script.sh and the version gets increased. Example the content of the textfile.txt hello version = x bye This include three steps 1. First find the char after... (2 Replies)
Discussion started by: Beachboy72
2 Replies

5. UNIX for Dummies Questions & Answers

Help with awk (making simple/advanced ini parser)

Hello I'm searching some kind of example (or ready-made solution, but I don't really want it, because I want to learn awk more), to make something like a parser in awk for something like this (I put example, because I don't really know how to explain this): line1=1 line2=0 line3=1... (23 Replies)
Discussion started by: jormung
23 Replies

6. Shell Programming and Scripting

advanced awk

Hi all Input group1 user1 user2 user3 group2 user4 user5 user1 group3 user6 user7 user8 Desired output group1 group2 (12 Replies)
Discussion started by: wakatana
12 Replies

7. Shell Programming and Scripting

Advanced grep and sed

I am wondering if there is a way via grep and sed to extract a string that is on the 2nd line below a known marker as in this example: TextRel 203 0 0 "WELL:" SetPosAbs 1287 -6676 TextRel 210 0 0 "AEP #2" The marker is WELL:, but the string I need is "AEP #2". Can grep/sed handle this... (19 Replies)
Discussion started by: phudgens
19 Replies

8. UNIX for Advanced & Expert Users

Advanced I/O

What is Stream Devices and Stream pipes? Explain Advanced I/O ? (1 Reply)
Discussion started by: thangappan
1 Replies

9. UNIX for Advanced & Expert Users

sed in awk ? or nested awk ?

Hey all, Can I put sed command inside the awk action ?? If not then can i do grep in the awk action ?? For ex: awk '$1=="174" { ppid=($2) ; sed -n '/$ppid/p' tempfind.txt ; }' tempfind.txt Assume: 174 is string. Assume: tempfind.txt is used for awk and sed both. tempfind.txt... (11 Replies)
Discussion started by: varungupta
11 Replies

10. UNIX for Dummies Questions & Answers

Advanced LS?

My goal is simply to output a listing of all files in a directory and all subdirectories, one per line, ****with their full path****. The *** part is what I can't figure out. I can get one on a line and I like having the extra info, so I'm using ls -Rl right now. But what I get is just the... (2 Replies)
Discussion started by: bostonrobot
2 Replies
Login or Register to Ask a Question