Convert columns to row using awk | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Convert columns to row using awk

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 01-15-2013
Jotne's Avatar
Jotne Jotne is offline
Registered User
 
Join Date: Dec 2010
Last Activity: 20 September 2014, 2:08 AM EDT
Posts: 1,040
Thanks: 62
Thanked 216 Times in 204 Posts
Convert columns to row using awk

Hi

I need to convert some columns form a html file to rows.
I do manage to make it works without help (some proud )
For some reason the offline status is not in bold, so I do need to remove the <b> tag from the other field to make this to work. All fields are not needed, so I test and select only one that I need.

Is there a more simple way to do this or can this code be cleaned some?

infile

Code:
				ohter blablabla
                        <TH TITLE="Different services during last 60s">CASC USERS</TH>
                        <TH colspan="3" class="centered">Action</TH>
				</TR>
                <TR class="online">
                        <TD class="usercol1"><SPAN TITLE="">master</SPAN></TD>
                        <TD class="usercol2"><b>online</b></TD>
                        <TD class="usercol3">148.31.202.211</TD>
                        <TD class="usercol4">40</TD>
                        <TD class="usercol5">4380</TD>
                        <TD class="usercol6">55</TD>
                        </TR>						
                <TR class="offline">
                        <TD class="usercol1"><SPAN TITLE="">madrid</SPAN></TD>
                        <TD class="usercol2">offline</TD>
                        <TD class="usercol3"></TD>
                        <TD class="usercol4">0</TD>						
                        <TD class="usercol5">120</TD>						
                        <TD class="usercol6">0</TD>
                        </TR>
                <TR class="connected">
                        <TD class="usercol1"><SPAN TITLE="">london</SPAN></TD>
                        <TD class="usercol2"><b>connected</b></TD>
                        <TD class="usercol3">10.10.10.41</TD>
                        <TD class="usercol4">34</TD>
                        <TD class="usercol5">632</TD>
                        <TD class="usercol6">430</TD>
                        </TR>
				</TABLE><BR>
				more .....

script

Code:
sed  's/<b>//g' infile | awk -F"[<>]" '{if ($0~"TR class=") {a=1}};
{if ($0~"/TR" && a==1) {a=0; print ""}};
{if ($0~"col1\"") printf "%s",$5};
{if ($0~"col2\"") printf "%s%s",",",$3};
{if ($0~"col3\"") printf "%s%s",",",$3};
{if ($0~"col5\"") printf "%s%s",",",$3}'

output

Code:
master,online,148.31.202.211,4380
madrid,offline,,120
london,connected,10.10.10.41,632

Sponsored Links
    #2  
Old 01-15-2013
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 25 October 2014, 2:34 PM EDT
Location: Варна, България / Milano, Italia
Posts: 5,675
Thanks: 184
Thanked 620 Times in 578 Posts
Note that awk and sed are not the right tools for parsing html.
If html2text or lynx are not available, I would use Perl, Python or Ruby.


Code:
% cat infile.html
                        <TH TITLE="Different services during last 60s">CASC USERS</TH>
                        <TH colspan="3" class="centered">Action</TH>
                                </TR>
                <TR class="online">
                        <TD class="usercol1"><SPAN TITLE="">master</SPAN></TD>
                        <TD class="usercol2"><b>online</b></TD>
                        <TD class="usercol3">148.31.202.211</TD>
                        <TD class="usercol4">40</TD>
                        <TD class="usercol5">4380</TD>
                        <TD class="usercol6">55</TD>
                        </TR>
                <TR class="offline">
                        <TD class="usercol1"><SPAN TITLE="">madrid</SPAN></TD>
                        <TD class="usercol2">offline</TD>
                        <TD class="usercol3"></TD>
                        <TD class="usercol4">0</TD>
                        <TD class="usercol5">120</TD>
                        <TD class="usercol6">0</TD>
                        </TR>
                <TR class="connected">
                        <TD class="usercol1"><SPAN TITLE="">london</SPAN></TD>
                        <TD class="usercol2"><b>connected</b></TD>
                        <TD class="usercol3">10.10.10.41</TD>
                        <TD class="usercol4">34</TD>
                        <TD class="usercol5">632</TD>
                        <TD class="usercol6">430</TD>
                        </TR>
                                </TABLE><BR>
% lynx -dump infile.html
    CASC USERS Action
   master online 148.31.202.211 40 4380 55
   madrid offline 0 120 0
   london connected 10.10.10.41 34 632 430

Sponsored Links
    #3  
Old 01-15-2013
Jotne's Avatar
Jotne Jotne is offline
Registered User
 
Join Date: Dec 2010
Last Activity: 20 September 2014, 2:08 AM EDT
Posts: 1,040
Thanks: 62
Thanked 216 Times in 204 Posts
Thank you.
I know its not the best, but awk is what I know and its included in nearly all system and it works
    #4  
Old 01-15-2013
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 25 October 2014, 2:34 PM EDT
Location: Варна, България / Milano, Italia
Posts: 5,675
Thanks: 184
Thanked 620 Times in 578 Posts
OK,
just as an exercise:


Code:
awk 'END {
  print rec
  }
  /col[1-3,5]/ { buildrec() }
  /<TR class="[^"]*"> *$/ && length(rec) {
    print rec
    rec = x
    }
func buildrec() {
  if (match($0, />[^<]*<\//))
    rec = length(rec) ? rec OFS substr($0, RSTART + 1, RLENGTH - 3) : \
      substr($0, RSTART + 1, RLENGTH - 3) 
  }' OFS=, infile

The Following 2 Users Say Thank You to radoulov For This Useful Post:
Jotne (01-15-2013), Scott (01-15-2013)
Sponsored Links
    #5  
Old 01-15-2013
Jotne's Avatar
Jotne Jotne is offline
Registered User
 
Join Date: Dec 2010
Last Activity: 20 September 2014, 2:08 AM EDT
Posts: 1,040
Thanks: 62
Thanked 216 Times in 204 Posts
Works fine, thanks
It will take me some time to understand how this works...
Sponsored Links
    #6  
Old 01-15-2013
radoulov's Avatar
radoulov radoulov is offline Forum Staff  
Moderator
 
Join Date: Jan 2007
Last Activity: 25 October 2014, 2:34 PM EDT
Location: Варна, България / Milano, Italia
Posts: 5,675
Thanks: 184
Thanked 620 Times in 578 Posts
I'll try to explain the script.


Code:
func buildrec() {
  if (match($0, />[^<]*<\//))
    rec = length(rec) ? rec OFS substr($0, RSTART + 1, RLENGTH - 3) : \
      substr($0, RSTART + 1, RLENGTH - 3)

buildrec is a user defined function that I used to avoid to repeat the same code for every match.
The function doesn't require parameters as it directly modifies
the global variable rec.
The function code performs the following actions:
- search for the pattern: an > followd by 0 or more occurrences of characters different than <, followed by the closing tag sequence </,
using the following regular expression: >[^<]*<\/
- when match is found, the value is appended to the variable rec (short for record). RSTART and RLENGTH are automatically set by the match builtin function

After that, the code is simple:


Code:
  /col[1-3,5]/ { buildrec() }
  /<TR class="[^"]*"> *$/ && length(rec) {
    print rec
    rec = x
    }

When the records match the pattern represented by the regular expression col[1-3,5] , build the record - append the values.
When the pattern <TR class="[^"]*"> *$ matches for a second time (rec is already build) - print the record and reset it: rec = x .
x is an uninitialized variable, so I'm using it as a shortcut for "".
The Following User Says Thank You to radoulov For This Useful Post:
Jotne (01-15-2013)
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
awk: convert column to row in a specific way lucasvs UNIX for Dummies Questions & Answers 1 04-04-2012 02:18 AM
By using AWK can I convert matrice shaped data to a row ? rpf Shell Programming and Scripting 14 10-05-2010 06:36 PM
Convert columns to single row vasuarjula Shell Programming and Scripting 5 06-03-2010 01:01 AM
How to convert 2 column data into multiple columns based on a keyword in a row?? ks_reddy Shell Programming and Scripting 1 02-16-2010 05:35 AM
convert matrix to row and columns tintin72 UNIX for Dummies Questions & Answers 3 06-12-2009 07:17 AM



All times are GMT -4. The time now is 07:34 PM.