Using awk for converting xml to txt


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using awk for converting xml to txt
# 1  
Old 07-20-2016
Using awk for converting xml to txt

Hi,

I have a xml script, I converted it to .txt with values comma seperated using awk function. But I want the output values should be inside double quotes

My xml script (Workorders.xml) is shown like below:

Code:
<?xml version="1.0" encoding="utf-8" ?>
<scbm-extract version="3.3">
<workOrderList>
<workOrder>
<workOrderCode>313194073</workOrderCode>
<branchCode>2021:1206</branchCode>
<demand></demand>
<demandLineItem></demandLineItem>
<priority></priority>
<completionDate>2016-07-10T08:00:00</completionDate>
</workOrder>


-----------------------------------------------------------------------------

I used the below awk command to do this (awk script copied from previous posts)

Code:
awk -f jobxml.awk WorkOrders.xml

jobxml.awk

Code:
BEGIN { FS="[<>]";      OFS=","         }
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {

        $0=""

        if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }

        for(N=1; N<=L; N++)     {       $N=A[T[N]];     delete A[T[N]]  }
        print
}

$2 && $4 && ($2 == substr($4, 2)) {
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}

Output comes like below:

Code:
workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
313194073,2021:1206,,,,2016-07-10T08:00:00

I want to show output like below. ie output values with double quotes

Code:
workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"

Please help me on this

Thanks,
Viswa

Last edited by Viswanatheee55; 07-20-2016 at 04:34 AM.. Reason: Missed double quotes in the end
# 2  
Old 07-20-2016
Hello Viswanatheee55,

Could you please try following and let me know if this helps you, I have only tested this with your provided Input_file.
Code:
awk 'BEGIN { FS="[<>]";      OFS=",";s1="\""         }
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {
         $0=""
         if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }
         for(N=1; N<=L; N++)     {       $N=s1 A[T[N]] s1;     delete A[T[N]]  }
        print 
         }
 $2 && $4 && ($2 == substr($4, 2)) {
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}'   Input_file

Output will be as follows.
Code:
workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"

Above has been tested with your provided Input_file, if you have more things to do then please mention into your post with complete details.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 07-20-2016
Try using "," as the field separator
Code:
awk '
  $1=="/" idx {
    if(!Rec) Header=Header "," idx
    Values=Values "\",\"" val
  } 
  {
    idx=$1
    val=$2
  } 
  $1=="/workOrder" {
    if(!Rec++) {
      sub(/,/,x,Header)           # Remove excess field separator
      print Header
    }
    sub(/","/,x,Values)           # Remove excess field separator
    print "\"" Values "\""        # Print adding extra quotes around the Values string
    Values=""
  }
' RS=\< FS=\> file

Output:
Code:
workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"


Last edited by Scrutinizer; 07-20-2016 at 06:08 AM.. Reason: Modified code so it should work for multiple records
This User Gave Thanks to Scrutinizer For This Post:
# 4  
Old 07-20-2016
Quote:
Originally Posted by RavinderSingh13
Hello Viswanatheee55,

Could you please try following and let me know if this helps you, I have only tested this with your provided Input_file.
Code:
awk 'BEGIN { FS="[<>]";      OFS=",";s1="\""         }
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {
         $0=""
         if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }
         for(N=1; N<=L; N++)     {       $N=s1 A[T[N]] s1;     delete A[T[N]]  }
        print 
         }
 $2 && $4 && ($2 == substr($4, 2)) {
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}'   Input_file

Output will be as follows.
Code:
workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"

Above has been tested with your provided Input_file, if you have more things to do then please mention into your post with complete details.

Thanks,
R. Singh
Hi,

When I use the above code it shows like below

Code:
awk -f jobxml.awk WorkOrders.xml

Code:
awk: jobxml.awk:1: awk 'BEGIN { FS="[<>]";      OFS=",";s1="\""         }
awk: jobxml.awk:1:     ^ invalid char ''' in expression

Thanks,
Viswa

---------- Post updated at 03:49 AM ---------- Previous update was at 03:43 AM ----------

Sorry I got it. That's my typo .

Thanks,
Viswa
# 5  
Old 07-20-2016
Hello Viswa,

Solution posted in POST#2 by me worked for me. Could you please try following and let me know if this helps you.
Code:
awk -vs1="\"" 'BEGIN { FS="[<>]";      OFS=",";}
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {
         $0=""
         if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }
         for(N=1; N<=L; N++)     {       $N=s1 A[T[N]] s1;     delete A[T[N]]  }
        print 
}
 $2 && $4 && ($2 == substr($4, 2)) {
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}'   Input_file

Output will be as follows.
Code:
workOrderCode,branchCode,demand,demandLineItem,priority,completionDate
"313194073","2021:1206","","","","2016-07-10T08:00:00"

Thanks,
R. Singh
# 6  
Old 07-20-2016
I got it Ravinder. Thank you so much for your help

Thanks,
Viswa

---------- Post updated at 06:04 AM ---------- Previous update was at 03:49 AM ----------

Hi Ravinder,

One more requirement

I dont want to list whole xml tags, Just need to list few xml tags.

I want to list the rows only for "workOrderCode,branchCode,demand,completionDate" columns

That means output should be like

Code:
workOrderCode,branchCode,demand,completionDate
"313194073","2021:1206","","2016-07-10T08:00:00"

Thanks,
Viswa
# 7  
Old 07-20-2016
Hello Viswa,

Could you please try following and let me know if this helps.
Code:
awk -vs1="\"" 'BEGIN { FS="[<>]";      OFS=",";}
# Single close-tag
(NF==3) && /^[ \t]*<[/]/        {
         $0=""
         if(!TITLE)      # Print a title line
        {
                for(N=1; N<=L; N++)     $N=T[N]
                print
                TITLE=1
        }
         for(N=1; N<=L; N++)     {       $N=s1 A[T[N]] s1;     delete A[T[N]]  }
        print
}
$2 && $4 && ($2 == substr($4, 2)) && $2 !~ /demandLineItem/ && $2 !~ /priority/{
        if(!T[$2]) { T[$2]=++L; T[L]=$2 }       # Save titles for later
        gsub(/^[ \t]*/, "", $3);                # Get rid of spaces in data
        gsub(/[ \t]*$/, "", $3);
        A[$2]=$3                                # Save for later
}'   Input_file

Output will be as follows.
Code:
workOrderCode,branchCode,demand,completionDate
"313194073","2021:1206","","2016-07-10T08:00:00"

Thanks,
R. Singh
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Desired output.txt for reading txt file using awk?

Dear all, I have a huge txt file (DATA.txt) with the following content . From this txt file, I want the following output using some shell script. Any help is greatly appreciated. Greetings, emily DATA.txt (snippet of the huge text file) 407202849... (2 Replies)
Discussion started by: emily
2 Replies

2. Shell Programming and Scripting

Weird issue in converting XLSX to TXT

Hi Guys, I have used Perl scripting to convert XLSX file to TXT file using Perl module Spreadsheet::XLSX. I processed one XLSX file having one column and 65k rows of data . Strangely ,It is merging data for every 2047 row and I could see data in TXT file as Ex: Suppose in XLSX file ,if... (2 Replies)
Discussion started by: Rajk459
2 Replies

3. Shell Programming and Scripting

Converting txt file into CSV using awk or sed

Hello folks I have a txt file of information about journal articles from different fields. I need to convert this information into a format that is easier for computers to manipulate for some research that I'm doing on how articles are cited. The file has some header information and then details... (8 Replies)
Discussion started by: ksk
8 Replies

4. UNIX for Dummies Questions & Answers

Need help converting txt to XML

I have a table as following Archive id Line Author Time Text 1fjj34 3 75jk5l 03:20 this is an evidence regarding ... 1fjj34 4 gjhhtrd 03:21 we have seen those documents before 1fjj34 10 645jmdvvb 04:00 Will you consider such an offer?... (0 Replies)
Discussion started by: A-V
0 Replies

5. Shell Programming and Scripting

awk append fileA.txt to growing file B.txt

This is appending a column. My question is fairly simple. I have a program generating data in a form like so: 1 20 2 22 3 23 4 12 5 43 For ever iteration I'm generating this data. I have the basic idea with cut -f 2 fileA.txt | paste -d >> FileB.txt ???? I want FileB.txt to grow, and... (4 Replies)
Discussion started by: theawknewbie
4 Replies

6. Shell Programming and Scripting

Converting txt file in csv

HI All, I have a text file memory.txt which has following values. Average: 822387 7346605 89.93 288845 4176593 2044589 51883 2.47 7600 i want to convert this file in csv format and i am using following command to do it. sed s/_/\./g <... (3 Replies)
Discussion started by: mkashif
3 Replies

7. Shell Programming and Scripting

converting .txt to comma delimeted file

Dear all, I have a file with 5L records. one of the record in the file is as shown below. MARIA THOMAS BASIL 1000 FM 1111 MD GHANA YY 77354 4774 99999999 1234567 I need to convert this record in below format "","","","","MARIA","THOMAS","BASIL","","1000 FM 1111 MD","STE... (1 Reply)
Discussion started by: OSD
1 Replies

8. UNIX for Advanced & Expert Users

Cutting two colums and converting them to ascii (.txt) format

Hello Gurus, I have a file in which two columns contains binary data i want to cut those two binary columns convert them to ascii and process the file. i tried using cut command..i am able to cut but how can i convert them from binary to ascii and paste them in their original position. ... (1 Reply)
Discussion started by: mora
1 Replies

9. UNIX for Dummies Questions & Answers

converting txt files in linux to windows

HI.. i have this problem..i want to copy a text file from linux and import it into an access file..but so many junk characters are coming in between..what i have to do.. plz help.. thanx in advance, (4 Replies)
Discussion started by: newbuddy
4 Replies

10. UNIX for Dummies Questions & Answers

converting .txt

Hello, I transferred some .txt files from windows to Unix. When i used the editor in Unix to open up the file, all the <cr> show up. How to I get rid of all of them? (4 Replies)
Discussion started by: laila63
4 Replies
Login or Register to Ask a Question