Visit Our UNIX and Linux User Community


Cleaning output using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Cleaning output using awk
# 1  
Old 03-20-2013
Cleaning output using awk

I have some small problem with my code.

data.html
Code:
                        <TD class="statuscol2">c</TD>
                        <TD class="statuscol3">18</TD>
                        <TD class="statuscol4"><SPAN TITLE="#04">test4</SPAN></TD>
                        <TD class="statuscol5">OFF</TD>

Code:
awk '/col2/ {
        for (i=1; i<=4; i++)
                {
                gsub(/^[ \t]+|<[^>]*>/, "");
                printf "%s,", $0;
                getline
        }
        print ""
}' data.html
c,18,test4,OFF,

This works fine, but sometimes there are more than one data filed in one line like this.


data.html
Code:
                        <TD class="statuscol2">c</TD>
                        <TD class="statuscol3">18</TD>
                        <TD class="statuscol4"><SPAN TITLE="#04">test4</SPAN></TD>
                        <TD class="statuscol5">OFF</TD><br>id8<br>

This gives out c,18,test4,OFFid8,
How do I only get first hit one the line and get c,18,test4,OFF,
# 2  
Old 03-20-2013
For extracting all tag data:
Code:
awk -F'[<>]' ' {
        for ( i = 3; i <= NF; i += 2 )
        {
                if ( $i != "" )
                        printf "%s,", $i
        }
} END {
        printf "\n"
} ' data.html

For extracting just first tag data:
Code:
awk -F'[<>]' ' {
        for ( i = 3; i <= NF; i += 2 )
        {
                if ( $i != "" && f == 0)
                {
                        f = 1
                        printf "%s,", $i
                }
        }
        f = 0
} END {
        printf "\n"
} ' data.html

# 3  
Old 03-20-2013
It worked for the example, but not for the whole data.
This is a repetitive task that will give many lines.
I do search for col2 as a trigger to start. Then I need f.eks only next 10 lines.

Here line 5 give extra data I do not need.


Code:
                <TR class="c">
                        <TD class="statuscol1">no</TD>
                        <TD class="statuscol2">c</TD>
                        <TD class="statuscol3">17</TD>
                        <TD class="statuscol4"><SPAN TITLE="#104">status</SPAN></TD>
                        <TD class="statuscol5"><a href="#" class="tooltip">ON<span>host<br>made<br></span></a></TD>
                        <TD class="statuscol6">ON</TD>
                        <TD class="statuscol7">3342</TD>
                        <TD class="statuscol8">37397</TD>
                        <TD class="statuscol9"><SPAN TITLE="">intra</SPAN></TD>
                        <TD class="statuscol10">20.03.13  11:01:48</TD>
                        <TD class="statuscol11">07:08:13</TD>
                        <TD class="statuscol12">073D</TD>
                        <TD class="statuscol13">Status42</TD>
                        <TD class="statuscol14">by local</TD>
                        <TD class="statuscol15"><SPAN CLASS="idlesec_normal">00:00:05</SPAN></TD>
                        <TD class="statuscol16">OK</TD>
                </TR>

eks output
Code:
c,17,status,ON,ON,3342,37397,intra,20.03.13  11:01:48,

I get:
Code:
c,17,status,ONhostmade,ON,3342,37397,intra,20.03.13  11:01:48,

# 4  
Old 03-20-2013
My 2nd suggestion should work with some minor changes:
Code:
awk -F'[<>]' ' /col2/ {
        cf = 1
} cf == 1 && c <= 8 {
        ++c
        for ( i = 3; i <= NF; i += 2 )
        {
                if ( $i != "" && f == 0 )
                {
                        f = 1
                        printf "%s,", $i
                }
        }
        f = 0
} c == 9 {
        printf "\n"
        cf = 0
        c  = 0
} ' data.html

This User Gave Thanks to Yoda For This Post:
# 5  
Old 03-20-2013
Works perfectly, thanks Smilie

Previous Thread | Next Thread
Test Your Knowledge in Computers #654
Difficulty: Easy
DOS was initially released on the IBM System/360.
True or False?

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk output yields error: awk:can't open job_name (Autosys)

Good evening, Im newbie at unix specially with awk From an scheduler program called Autosys i want to extract some data reading an inputfile that comprises jobs names, then formating the output to columns for example 1. This is the inputfile: $ more MapaRep.txt ds_extra_nikira_usuarios... (18 Replies)
Discussion started by: alexcol
18 Replies

2. Shell Programming and Scripting

Cleaning through perl or awk a Stemmer dictionary

Hello, I work under Windows Vista and I am compiling an open-source stemmer dictionary for English and eventually for other Indian languages. The Engine which I have written has spewed out all lemmatised/expanded forms of the words: Nouns, Adjectives, Adverbs etc. Each set of expanded forms is... (4 Replies)
Discussion started by: gimley
4 Replies

3. Shell Programming and Scripting

Cleaning AWK code

Hi I need some help to clean my code used to get city location. wget -q -O - http://www.ip2location.com/ | grep chkRegionCity | awk 'END { print }' | awk -F"" '{print $4}' It gives me the city but have a leading space. I am sure this could all be done by one single AWK Also if possible... (8 Replies)
Discussion started by: Jotne
8 Replies

4. Shell Programming and Scripting

Awk script to run a sql and print the output to an output file

Hi All, I have around 900 Select Sql's which I would like to run in an awk script and print the output of those sql's in an txt file. Can you anyone pls let me know how do I do it and execute the awk script? Thanks. (4 Replies)
Discussion started by: adept
4 Replies

5. Shell Programming and Scripting

cleaning the file

Hi, I have a file with multiple rows. each row has 8 columns. Column 8 has entries separated by commas. I want to exclude all the rows in which column 8 has more than 3 commas. 1234#0/1 - ABC_1234 3 ATGCATGCATGC HHHIIIGIHVF 1 49:T>C,60:T>C,78:C>A,76:G>T,65:T>G Thanks, Diya (3 Replies)
Discussion started by: Diya123
3 Replies

6. Shell Programming and Scripting

File cleaning

HI , I am getting the source data as below. Source Data CDR_Data,,,,, F1,F2,F3,F4,F5,F6 5,5,6,7,8,7 6,6,g,,, 7,7,76,,, 8,8,gt,,, 9,9,df ,d,d,d ,,,,, (4 Replies)
Discussion started by: wangkc
4 Replies

7. UNIX for Dummies Questions & Answers

AWK Data Cleaning

Hello, I am trying to analyze data I recently ran, and the only way to efficiently clean up the data is by using an awk file. I am very new to awk and am having great difficulty with it. In $8 and $9, for example, I am trying to delete numbers that contain 1. I cannot find any tutorials that... (20 Replies)
Discussion started by: carmar87
20 Replies

8. Shell Programming and Scripting

awk: round output or delimit output of arithmatic string

I have a file with the following content. > cat /tmp/internetusage.txt 6709.296322 30000 2/7/2010 0.00I am using the following awk command to calculate a percentage from field 1 and 2 from the file. awk '{ print $1/$2*100 }' /tmp/internetusage.txt This outputs the value "22.3643" as a... (1 Reply)
Discussion started by: jelloir
1 Replies

9. AIX

doing some spring cleaning....

USERS="me you jim joe sue" for user in ${USERS}; do rmuser -p $user usrdir=`cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` rm -fr `cat /etc/passwd|grep $user|awk -F":" '{ print $6 }'` echo Deleting: $user '\t' REMOVING: $usrdir done This is for AIX ONLY!!! but easily ported to... (0 Replies)
Discussion started by: Optimus_P
0 Replies

Featured Tech Videos