Unix/Linux Go Back    


Shell Programming and Scripting Unix shell scripting - KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and shell scripts and shell scripting languages here.

Convert columns to row using awk

Shell Programming and Scripting


Closed Linux or Unix Question    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 01-15-2013
Jotne's Unix or Linux Image
Jotne Jotne is offline
Registered User
 
Join Date: Dec 2010
Last Activity: 20 September 2014, 2:08 AM EDT
Posts: 1,040
Thanks: 62
Thanked 216 Times in 204 Posts
Convert columns to row using awk

Hi

I need to convert some columns form a html file to rows.
I do manage to make it works without help (some proud Linux )
For some reason the offline status is not in bold, so I do need to remove the <b> tag from the other field to make this to work. All fields are not needed, so I test and select only one that I need.

Is there a more simple way to do this or can this code be cleaned some?

infile

Code:
				ohter blablabla
                        <TH TITLE="Different services during last 60s">CASC USERS</TH>
                        <TH colspan="3" class="centered">Action</TH>
				</TR>
                <TR class="online">
                        <TD class="usercol1"><SPAN TITLE="">master</SPAN></TD>
                        <TD class="usercol2"><b>online</b></TD>
                        <TD class="usercol3">148.31.202.211</TD>
                        <TD class="usercol4">40</TD>
                        <TD class="usercol5">4380</TD>
                        <TD class="usercol6">55</TD>
                        </TR>						
                <TR class="offline">
                        <TD class="usercol1"><SPAN TITLE="">madrid</SPAN></TD>
                        <TD class="usercol2">offline</TD>
                        <TD class="usercol3"></TD>
                        <TD class="usercol4">0</TD>						
                        <TD class="usercol5">120</TD>						
                        <TD class="usercol6">0</TD>
                        </TR>
                <TR class="connected">
                        <TD class="usercol1"><SPAN TITLE="">london</SPAN></TD>
                        <TD class="usercol2"><b>connected</b></TD>
                        <TD class="usercol3">10.10.10.41</TD>
                        <TD class="usercol4">34</TD>
                        <TD class="usercol5">632</TD>
                        <TD class="usercol6">430</TD>
                        </TR>
				</TABLE><BR>
				more .....

script

Code:
sed  's/<b>//g' infile | awk -F"[<>]" '{if ($0~"TR class=") {a=1}};
{if ($0~"/TR" && a==1) {a=0; print ""}};
{if ($0~"col1\"") printf "%s",$5};
{if ($0~"col2\"") printf "%s%s",",",$3};
{if ($0~"col3\"") printf "%s%s",",",$3};
{if ($0~"col5\"") printf "%s%s",",",$3}'

output

Code:
master,online,148.31.202.211,4380
madrid,offline,,120
london,connected,10.10.10.41,632

Sponsored Links
    #2  
Old Unix and Linux 01-15-2013
radoulov's Unix or Linux Image
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 23 April 2015, 3:05 PM EDT
Location: Варна, България / Milano, Italia
Posts: 5,690
Thanks: 184
Thanked 624 Times in 582 Posts
Note that awk and sed are not the right tools for parsing html.
If html2text or lynx are not available, I would use Perl, Python or Ruby.


Code:
% cat infile.html
                        <TH TITLE="Different services during last 60s">CASC USERS</TH>
                        <TH colspan="3" class="centered">Action</TH>
                                </TR>
                <TR class="online">
                        <TD class="usercol1"><SPAN TITLE="">master</SPAN></TD>
                        <TD class="usercol2"><b>online</b></TD>
                        <TD class="usercol3">148.31.202.211</TD>
                        <TD class="usercol4">40</TD>
                        <TD class="usercol5">4380</TD>
                        <TD class="usercol6">55</TD>
                        </TR>
                <TR class="offline">
                        <TD class="usercol1"><SPAN TITLE="">madrid</SPAN></TD>
                        <TD class="usercol2">offline</TD>
                        <TD class="usercol3"></TD>
                        <TD class="usercol4">0</TD>
                        <TD class="usercol5">120</TD>
                        <TD class="usercol6">0</TD>
                        </TR>
                <TR class="connected">
                        <TD class="usercol1"><SPAN TITLE="">london</SPAN></TD>
                        <TD class="usercol2"><b>connected</b></TD>
                        <TD class="usercol3">10.10.10.41</TD>
                        <TD class="usercol4">34</TD>
                        <TD class="usercol5">632</TD>
                        <TD class="usercol6">430</TD>
                        </TR>
                                </TABLE><BR>
% lynx -dump infile.html
    CASC USERS Action
   master online 148.31.202.211 40 4380 55
   madrid offline 0 120 0
   london connected 10.10.10.41 34 632 430

Sponsored Links
    #3  
Old Unix and Linux 01-15-2013
Jotne's Unix or Linux Image
Jotne Jotne is offline
Registered User
 
Join Date: Dec 2010
Last Activity: 20 September 2014, 2:08 AM EDT
Posts: 1,040
Thanks: 62
Thanked 216 Times in 204 Posts
Thank you.
I know its not the best, but awk is what I know and its included in nearly all system and it works Linux
    #4  
Old Unix and Linux 01-15-2013
radoulov's Unix or Linux Image
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 23 April 2015, 3:05 PM EDT
Location: Варна, България / Milano, Italia
Posts: 5,690
Thanks: 184
Thanked 624 Times in 582 Posts
OK,
just as an exercise:


Code:
awk 'END {
  print rec
  }
  /col[1-3,5]/ { buildrec() }
  /<TR class="[^"]*"> *$/ && length(rec) {
    print rec
    rec = x
    }
func buildrec() {
  if (match($0, />[^<]*<\//))
    rec = length(rec) ? rec OFS substr($0, RSTART + 1, RLENGTH - 3) : \
      substr($0, RSTART + 1, RLENGTH - 3) 
  }' OFS=, infile

The Following 2 Users Say Thank You to radoulov For This Useful Post:
Jotne (01-15-2013), Scott (01-15-2013)
Sponsored Links
    #5  
Old Unix and Linux 01-15-2013
Jotne's Unix or Linux Image
Jotne Jotne is offline
Registered User
 
Join Date: Dec 2010
Last Activity: 20 September 2014, 2:08 AM EDT
Posts: 1,040
Thanks: 62
Thanked 216 Times in 204 Posts
Works fine, thanks Linux
It will take me some time to understand how this works...
Sponsored Links
    #6  
Old Unix and Linux 01-15-2013
radoulov's Unix or Linux Image
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 23 April 2015, 3:05 PM EDT
Location: Варна, България / Milano, Italia
Posts: 5,690
Thanks: 184
Thanked 624 Times in 582 Posts
I'll try to explain the script.


Code:
func buildrec() {
  if (match($0, />[^<]*<\//))
    rec = length(rec) ? rec OFS substr($0, RSTART + 1, RLENGTH - 3) : \
      substr($0, RSTART + 1, RLENGTH - 3)

buildrec is a user defined function that I used to avoid to repeat the same code for every match.
The function doesn't require parameters as it directly modifies
the global variable rec.
The function code performs the following actions:
- search for the pattern: an > followd by 0 or more occurrences of characters different than <, followed by the closing tag sequence </,
using the following regular expression: >[^<]*<\/
- when match is found, the value is appended to the variable rec (short for record). RSTART and RLENGTH are automatically set by the match builtin function

After that, the code is simple:


Code:
  /col[1-3,5]/ { buildrec() }
  /<TR class="[^"]*"> *$/ && length(rec) {
    print rec
    rec = x
    }

When the records match the pattern represented by the regular expression col[1-3,5] , build the record - append the values.
When the pattern <TR class="[^"]*"> *$ matches for a second time (rec is already build) - print the record and reset it: rec = x .
x is an uninitialized variable, so I'm using it as a shortcut for "".
The Following User Says Thank You to radoulov For This Useful Post:
Jotne (01-15-2013)
Sponsored Links
Closed Linux or Unix Question

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk: convert column to row in a specific way lucasvs UNIX for Dummies Questions & Answers 1 04-04-2012 02:18 AM
By using AWK can I convert matrice shaped data to a row ? rpf Shell Programming and Scripting 14 10-05-2010 06:36 PM
Convert columns to single row vasuarjula Shell Programming and Scripting 5 06-03-2010 01:01 AM
How to convert 2 column data into multiple columns based on a keyword in a row?? ks_reddy Shell Programming and Scripting 1 02-16-2010 05:35 AM
convert matrix to row and columns tintin72 UNIX for Dummies Questions & Answers 3 06-12-2009 07:17 AM



All times are GMT -4. The time now is 06:14 PM.