|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
||||
|
||||
|
Convert columns to row using awk
Hi I need to convert some columns form a html file to rows. I do manage to make it works without help (some proud )For some reason the offline status is not in bold, so I do need to remove the <b> tag from the other field to make this to work. All fields are not needed, so I test and select only one that I need. Is there a more simple way to do this or can this code be cleaned some? infile Code:
ohter blablabla
<TH TITLE="Different services during last 60s">CASC USERS</TH>
<TH colspan="3" class="centered">Action</TH>
</TR>
<TR class="online">
<TD class="usercol1"><SPAN TITLE="">master</SPAN></TD>
<TD class="usercol2"><b>online</b></TD>
<TD class="usercol3">148.31.202.211</TD>
<TD class="usercol4">40</TD>
<TD class="usercol5">4380</TD>
<TD class="usercol6">55</TD>
</TR>
<TR class="offline">
<TD class="usercol1"><SPAN TITLE="">madrid</SPAN></TD>
<TD class="usercol2">offline</TD>
<TD class="usercol3"></TD>
<TD class="usercol4">0</TD>
<TD class="usercol5">120</TD>
<TD class="usercol6">0</TD>
</TR>
<TR class="connected">
<TD class="usercol1"><SPAN TITLE="">london</SPAN></TD>
<TD class="usercol2"><b>connected</b></TD>
<TD class="usercol3">10.10.10.41</TD>
<TD class="usercol4">34</TD>
<TD class="usercol5">632</TD>
<TD class="usercol6">430</TD>
</TR>
</TABLE><BR>
more .....script Code:
sed 's/<b>//g' infile | awk -F"[<>]" '{if ($0~"TR class=") {a=1}};
{if ($0~"/TR" && a==1) {a=0; print ""}};
{if ($0~"col1\"") printf "%s",$5};
{if ($0~"col2\"") printf "%s%s",",",$3};
{if ($0~"col3\"") printf "%s%s",",",$3};
{if ($0~"col5\"") printf "%s%s",",",$3}'output Code:
master,online,148.31.202.211,4380 madrid,offline,,120 london,connected,10.10.10.41,632 |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
Note that awk and sed are not the right tools for parsing html. If html2text or lynx are not available, I would use Perl, Python or Ruby. Code:
% cat infile.html
<TH TITLE="Different services during last 60s">CASC USERS</TH>
<TH colspan="3" class="centered">Action</TH>
</TR>
<TR class="online">
<TD class="usercol1"><SPAN TITLE="">master</SPAN></TD>
<TD class="usercol2"><b>online</b></TD>
<TD class="usercol3">148.31.202.211</TD>
<TD class="usercol4">40</TD>
<TD class="usercol5">4380</TD>
<TD class="usercol6">55</TD>
</TR>
<TR class="offline">
<TD class="usercol1"><SPAN TITLE="">madrid</SPAN></TD>
<TD class="usercol2">offline</TD>
<TD class="usercol3"></TD>
<TD class="usercol4">0</TD>
<TD class="usercol5">120</TD>
<TD class="usercol6">0</TD>
</TR>
<TR class="connected">
<TD class="usercol1"><SPAN TITLE="">london</SPAN></TD>
<TD class="usercol2"><b>connected</b></TD>
<TD class="usercol3">10.10.10.41</TD>
<TD class="usercol4">34</TD>
<TD class="usercol5">632</TD>
<TD class="usercol6">430</TD>
</TR>
</TABLE><BR>
% lynx -dump infile.html
CASC USERS Action
master online 148.31.202.211 40 4380 55
madrid offline 0 120 0
london connected 10.10.10.41 34 632 430 |
| Sponsored Links | ||
|
|
#3
|
||||
|
||||
|
Thank you.
I know its not the best, but awk is what I know and its included in nearly all system and it works ![]() |
|
#4
|
||||
|
||||
|
OK, just as an exercise: Code:
awk 'END {
print rec
}
/col[1-3,5]/ { buildrec() }
/<TR class="[^"]*"> *$/ && length(rec) {
print rec
rec = x
}
func buildrec() {
if (match($0, />[^<]*<\//))
rec = length(rec) ? rec OFS substr($0, RSTART + 1, RLENGTH - 3) : \
substr($0, RSTART + 1, RLENGTH - 3)
}' OFS=, infile |
| Sponsored Links | |
|
|
#5
|
||||
|
||||
|
Works fine, thanks
It will take me some time to understand how this works... |
| Sponsored Links | |
|
|
#6
|
||||
|
||||
|
I'll try to explain the script. Code:
func buildrec() {
if (match($0, />[^<]*<\//))
rec = length(rec) ? rec OFS substr($0, RSTART + 1, RLENGTH - 3) : \
substr($0, RSTART + 1, RLENGTH - 3)buildrec is a user defined function that I used to avoid to repeat the same code for every match. The function doesn't require parameters as it directly modifies the global variable rec. The function code performs the following actions: - search for the pattern: an > followd by 0 or more occurrences of characters different than <, followed by the closing tag sequence </, using the following regular expression: >[^<]*<\/ - when match is found, the value is appended to the variable rec (short for record). RSTART and RLENGTH are automatically set by the match builtin function After that, the code is simple: Code:
/col[1-3,5]/ { buildrec() }
/<TR class="[^"]*"> *$/ && length(rec) {
print rec
rec = x
}When the records match the pattern represented by the regular expression col[1-3,5] , build the record - append the values. When the pattern <TR class="[^"]*"> *$ matches for a second time (rec is already build) - print the record and reset it: rec = x . x is an uninitialized variable, so I'm using it as a shortcut for "". |
| The Following User Says Thank You to radoulov For This Useful Post: | ||
Jotne (01-15-2013) | ||
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| awk: convert column to row in a specific way | lucasvs | UNIX for Dummies Questions & Answers | 1 | 04-04-2012 02:18 AM |
| By using AWK can I convert matrice shaped data to a row ? | rpf | Shell Programming and Scripting | 14 | 10-05-2010 06:36 PM |
| Convert columns to single row | vasuarjula | Shell Programming and Scripting | 5 | 06-03-2010 01:01 AM |
| How to convert 2 column data into multiple columns based on a keyword in a row?? | ks_reddy | Shell Programming and Scripting | 1 | 02-16-2010 05:35 AM |
| convert matrix to row and columns | tintin72 | UNIX for Dummies Questions & Answers | 3 | 06-12-2009 07:17 AM |
|
|