how to parse the file in xml format using awk/nawk

how to parse the file in xml format using awk/nawk

Hi All,
I have an xml file with the below format.

output needed is

nawk 'BEGIN{FS="<|>"}
       {print a,b,c,e,f
         {for(i=1;i<=NF;i++) {if($i=="a"){a=$(i+1);continue}}}
         {for(i=1;i<=NF;i++) {if($i=="b"){b=$(i+1); continue}}}
         {for(i=1;i<=NF;i++) {if($i=="c"){d=$(i+1); continue}}}
         {for(i=1;i<=NF;i++) {if($i=="e"){d=$(i+1); continue}}}
         {for(i=1;i<=NF;i++) {if($i=="f"){d=$(i+1); continue}}}
       END {print a,b,c,e,f}' file

the output that I have is

ANy one have any idea?

lots of threads are available regarding this.
please use search.
Trick is in using the right tool for the right job.

There are modules already available in CPAN for xml parsing and creating xml stuff. Try them instead! Smilie
there are modules already available in CPAN for xml parsing and creating xml stuff. Try them instead!
Unfortunately, if you look closely at the string, it is not valid XML as it is not well-formed. No XML parser is going to handle this string.

One way of doing it would be to use a mix of sed and awk to parse and process the line
sed 's/\<d\>/|/g' file | sed 's/\<\/.\>/ /g' | sed 's/\<.\>//g' | sed 's/ \(.\)/,\1/g' | \
sed 's/,|/|/g' |  awk -F'|' '{ printf "%s,%s\n", $1, $2; printf "%s,%s\n", $1, $3 }'

This outputs

Not elegant but it works!
my @tmp=$_=~/[0-9]+/g;
my @a1=@tmp[0..4];
my @a2=@tmp[0..2,5,6];
print join ",", @a1;
print "\n";
print join ",",@a2;

Hi anchal _khare,matrixmadhan,fpmurphy,summer_cherry

Thank you very much for your help!!

Originally Posted by summer_cherry
my @tmp=$_=~/[0-9]+/g;
my @a1=@tmp[0..4];
my @a2=@tmp[0..2,5,6];
print join ",", @a1;
print "\n";
print join ",",@a2;

Hi summer_cherry,

This is the perl script?

Another way with awk...
awk -F"<d>" '{print $1","$2,"\n"$1","$3}' f1 | tr -d '<[a-z]>' | tr '\/' ','

