Visit Our UNIX and Linux User Community


Convert columns to row using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Convert columns to row using awk
# 1  
Old 01-15-2013
Convert columns to row using awk

Hi

I need to convert some columns form a html file to rows.
I do manage to make it works without help (some proud Smilie )
For some reason the offline status is not in bold, so I do need to remove the <b> tag from the other field to make this to work. All fields are not needed, so I test and select only one that I need.

Is there a more simple way to do this or can this code be cleaned some?

infile
Code:
				ohter blablabla
                        <TH TITLE="Different services during last 60s">CASC USERS</TH>
                        <TH colspan="3" class="centered">Action</TH>
				</TR>
                <TR class="online">
                        <TD class="usercol1"><SPAN TITLE="">master</SPAN></TD>
                        <TD class="usercol2"><b>online</b></TD>
                        <TD class="usercol3">148.31.202.211</TD>
                        <TD class="usercol4">40</TD>
                        <TD class="usercol5">4380</TD>
                        <TD class="usercol6">55</TD>
                        </TR>						
                <TR class="offline">
                        <TD class="usercol1"><SPAN TITLE="">madrid</SPAN></TD>
                        <TD class="usercol2">offline</TD>
                        <TD class="usercol3"></TD>
                        <TD class="usercol4">0</TD>						
                        <TD class="usercol5">120</TD>						
                        <TD class="usercol6">0</TD>
                        </TR>
                <TR class="connected">
                        <TD class="usercol1"><SPAN TITLE="">london</SPAN></TD>
                        <TD class="usercol2"><b>connected</b></TD>
                        <TD class="usercol3">10.10.10.41</TD>
                        <TD class="usercol4">34</TD>
                        <TD class="usercol5">632</TD>
                        <TD class="usercol6">430</TD>
                        </TR>
				</TABLE><BR>
				more .....

script
Code:
sed  's/<b>//g' infile | awk -F"[<>]" '{if ($0~"TR class=") {a=1}};
{if ($0~"/TR" && a==1) {a=0; print ""}};
{if ($0~"col1\"") printf "%s",$5};
{if ($0~"col2\"") printf "%s%s",",",$3};
{if ($0~"col3\"") printf "%s%s",",",$3};
{if ($0~"col5\"") printf "%s%s",",",$3}'

output
Code:
master,online,148.31.202.211,4380
madrid,offline,,120
london,connected,10.10.10.41,632

# 2  
Old 01-15-2013
Note that awk and sed are not the right tools for parsing html.
If html2text or lynx are not available, I would use Perl, Python or Ruby.

Code:
% cat infile.html
                        <TH TITLE="Different services during last 60s">CASC USERS</TH>
                        <TH colspan="3" class="centered">Action</TH>
                                </TR>
                <TR class="online">
                        <TD class="usercol1"><SPAN TITLE="">master</SPAN></TD>
                        <TD class="usercol2"><b>online</b></TD>
                        <TD class="usercol3">148.31.202.211</TD>
                        <TD class="usercol4">40</TD>
                        <TD class="usercol5">4380</TD>
                        <TD class="usercol6">55</TD>
                        </TR>
                <TR class="offline">
                        <TD class="usercol1"><SPAN TITLE="">madrid</SPAN></TD>
                        <TD class="usercol2">offline</TD>
                        <TD class="usercol3"></TD>
                        <TD class="usercol4">0</TD>
                        <TD class="usercol5">120</TD>
                        <TD class="usercol6">0</TD>
                        </TR>
                <TR class="connected">
                        <TD class="usercol1"><SPAN TITLE="">london</SPAN></TD>
                        <TD class="usercol2"><b>connected</b></TD>
                        <TD class="usercol3">10.10.10.41</TD>
                        <TD class="usercol4">34</TD>
                        <TD class="usercol5">632</TD>
                        <TD class="usercol6">430</TD>
                        </TR>
                                </TABLE><BR>
% lynx -dump infile.html
    CASC USERS Action
   master online 148.31.202.211 40 4380 55
   madrid offline 0 120 0
   london connected 10.10.10.41 34 632 430

# 3  
Old 01-15-2013
Thank you.
I know its not the best, but awk is what I know and its included in nearly all system and it works Smilie
# 4  
Old 01-15-2013
OK,
just as an exercise:

Code:
awk 'END {
  print rec
  }
  /col[1-3,5]/ { buildrec() }
  /<TR class="[^"]*"> *$/ && length(rec) {
    print rec
    rec = x
    }
func buildrec() {
  if (match($0, />[^<]*<\//))
    rec = length(rec) ? rec OFS substr($0, RSTART + 1, RLENGTH - 3) : \
      substr($0, RSTART + 1, RLENGTH - 3) 
  }' OFS=, infile

These 2 Users Gave Thanks to radoulov For This Post:
# 5  
Old 01-15-2013
Works fine, thanks Smilie
It will take me some time to understand how this works...
# 6  
Old 01-15-2013
I'll try to explain the script.

Code:
func buildrec() {
  if (match($0, />[^<]*<\//))
    rec = length(rec) ? rec OFS substr($0, RSTART + 1, RLENGTH - 3) : \
      substr($0, RSTART + 1, RLENGTH - 3)

buildrec is a user defined function that I used to avoid to repeat the same code for every match.
The function doesn't require parameters as it directly modifies
the global variable rec.
The function code performs the following actions:
- search for the pattern: an > followd by 0 or more occurrences of characters different than <, followed by the closing tag sequence </,
using the following regular expression: >[^<]*<\/
- when match is found, the value is appended to the variable rec (short for record). RSTART and RLENGTH are automatically set by the match builtin function

After that, the code is simple:

Code:
  /col[1-3,5]/ { buildrec() }
  /<TR class="[^"]*"> *$/ && length(rec) {
    print rec
    rec = x
    }

When the records match the pattern represented by the regular expression col[1-3,5], build the record - append the values.
When the pattern <TR class="[^"]*"> *$ matches for a second time (rec is already build) - print the record and reset it: rec = x.
x is an uninitialized variable, so I'm using it as a shortcut for "".
This User Gave Thanks to radoulov For This Post:

Previous Thread | Next Thread
Test Your Knowledge in Computers #450
Difficulty: Medium
In 2016, the Linux Mint website was compromised by unknown hackers, who briefly replaced download links for a version of Linux Mint with a modified version that contained malware.
True or False?

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk split columns to row after N number of column

I want to split this with every 5 or 50 depend on how much data the file will have. And remove the comma on the end Source file will have 001,0002,0003,004,005,0006,0007,007A,007B,007C,007E,007F,008A,008C Need Output from every 5 tab and remove the comma from end of each row ... (4 Replies)
Discussion started by: ranjancom2000
4 Replies

2. Shell Programming and Scripting

Convert row to columns start from nth column

Dear All, We have input like this: 161 57 1378 176 1392 262 1444 441 1548 538 1611 670 1684 241 57 1378 208 1393 269 1447 444 1549 538 1610 677 1700 321 ... (4 Replies)
Discussion started by: attila
4 Replies

3. UNIX for Dummies Questions & Answers

awk: convert column to row in a specific way

Hi all! I have this kind of output: a1|b1|c1|d1|e1 a2|b2|c2 a3|b3|c3|d3 I would like to transpose columns d and e (when they exist) in column c, and under the row where they come from. Then copying the beginning of the row. In order to obtain: a1|b1|c1 a1|b1|d1 a1|b1|e1 a2|b2|c2... (1 Reply)
Discussion started by: lucasvs
1 Replies

4. Shell Programming and Scripting

awk print specific columns one row at a time

Hello, I have the following piece of code: roleName =`cat $inputFile | awk -F';' '{ print $1 }'` roleDescription =`cat $inputFile | awk -F';' '{ print $2 }'` roleAuthProfile =`cat $inputFile | awk -F';' '{ print $3 }'` mappedUserID (5 Replies)
Discussion started by: pr0tocoldan
5 Replies

5. Shell Programming and Scripting

By using AWK can I convert matrice shaped data to a row ?

Hello, I have output in the matrice form , for example: 1 2 3 4 a b c d jim joe sue tom how can I convert this line-column data into a row as follows 1 2 3 4 a b c d jim joe sue tom thank you (14 Replies)
Discussion started by: rpf
14 Replies

6. Shell Programming and Scripting

Convert columns to single row

Hello all I have data like 1 2 3 4 5 I wish my output would be like 1,2,3,4,5 For this i have executed 'BEGIN {FS="\n"; ORS=","} {print $0}' test and got the output as 1,2,3,4,5, I do not want to have , at the end of 5. output should be like (5 Replies)
Discussion started by: vasuarjula
5 Replies

7. Shell Programming and Scripting

How to convert 2 column data into multiple columns based on a keyword in a row??

Hi Friends I have the following input data in 2 columns. SNo 1 I1 Value I2 Value I3 Value SNo 2 I4 Value I5 Value I6 Value I7 Value SNo 3 I8 Value I9 Value ............... ................ SNo N (1 Reply)
Discussion started by: ks_reddy
1 Replies

8. UNIX for Dummies Questions & Answers

convert matrix to row and columns

Dear Unix Gurus, I have a sample data set that looks like this y1 y2 y3 y4 y5 x1 0.3 0.5 2.3 3.1 5.1 x2 1.2 4.1 3.5 1.7 1.2 x3 3.1 2.1 1.0 4.1 2.1 x4 5.0 4.0 6.0 7.0 1.1 I want to open it up so that I get x1 y1 0.3 x2 y1 1.2 x3 y1 3.1 x4 y1 5.0 x1 y2 0.5 x2 y2... (3 Replies)
Discussion started by: tintin72
3 Replies

Featured Tech Videos