UNIX/PERL script to convert XML file to pipe delimited format

10-29-2015

Registered User

8, 0

Join Date: Dec 2009

Last Activity: 20 October 2017, 3:29 PM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

UNIX/PERL script to convert XML file to pipe delimited format

Hello, I need to get few values from a XML file and output needs to be written in another file with pipe delimited format. The Header & Footer of the Pipe Delimited file will be constant.

The below is my sample XML file. I need to pull the values in between the XML tags <Operator_info to </Operator_info>. The values are NAME, PROFILE, ENABLE STATUS (IF status is Enabled leave Blank and DISABLED write first character D) and LASTSIGNON Date.

XML INPUT FILE:

Code:

<?xml version="1.0" encoding="UTF-8"?>
<OpList xmlns="urn:swift:saa:xsd:extractor">
<Operator_info xmlns="urn:swift:saa:xsd:extractor">
<!-- *** Extracted Data for Operators *** -->
 <ns5:OperatorDefn xmlns="urn:swift:saa:xsd:operator" xmlns:ns2="urn:swift:saa:xsd:authenticationservergroup" xmlns:ns3="urn:swift:saa:xsd:operatorprofile" xmlns:ns4="urn:swift:saa:xsd:unit" xmlns:ns5="urn:swift:saa:xsd:extractor" xmlns:ns6="urn:swift:saa:xsd:licenseddestination">
    <ns5:Operator>
        <Identifier>
            <Name>HELLO123</Name>
        </Identifier>
        <Description>Virat, Kholi</Description>
        <OperatorType>HUMAN</OperatorType>
        <AuthenticationType>LOCAL</AuthenticationType>
        <Profile>
            <ns3:Name>PROFILE1</ns3:Name>
        </Profile>
        <Unit>
            <ns4:Name>None</ns4:Name>
        </Unit>
        </ns5:Operator>
    <ns5:EnableStatus>ENABLED</ns5:EnableStatus>
    <ns5:ReEnableDate>n/a</ns5:ReEnableDate>
    <ns5:ApprovalStatus>APPROVED</ns5:ApprovalStatus>
    <ns5:LastChanged>25/07/14 20:15:35</ns5:LastChanged>
    <ns5:LastSignOn>18/10/15</ns5:LastSignOn>
    <ns5:LastEnabled>18/01/12 15:27:13</ns5:LastEnabled>
</ns5:OperatorDefn>
 <ns5:OperatorDefn xmlns="urn:swift:saa:xsd:operator" xmlns:ns2="urn:swift:saa:xsd:authenticationservergroup" xmlns:ns3="urn:swift:saa:xsd:operatorprofile" xmlns:ns4="urn:swift:saa:xsd:unit" xmlns:ns5="urn:swift:saa:xsd:extractor" xmlns:ns6="urn:swift:saa:xsd:licenseddestination">
    <ns5:Operator>
        <Identifier>
            <Name>HELLO12</Name>
        </Identifier>
        <Description>SACHIN,TEN</Description>
        <OperatorType>HUMAN</OperatorType>
        <AuthenticationType>LOCAL</AuthenticationType>
        <Profile>
            <ns3:Name>PROFILE2</ns3:Name>
        </Profile>
          <Profile>
            <ns3:Name>PROFILE3</ns3:Name>
        </Profile>
               <Unit>
            <ns4:Name>None</ns4:Name>
        </Unit>
       </ns5:Operator>
    <ns5:EnableStatus>DISABLED</ns5:EnableStatus>
    <ns5:ReEnableDate>n/a</ns5:ReEnableDate>
    <ns5:ApprovalStatus>APPROVED</ns5:ApprovalStatus>
    <ns5:LastChanged>14/02/12 17:34:35</ns5:LastChanged>
    <ns5:LastSignOn>n/a</ns5:LastSignOn>
    <ns5:LastEnabled>18/01/12 15:26:55</ns5:LastEnabled>
</ns5:OperatorDefn>
</Operator_info>

Expected Output:

Code:

20151027 GLOBAL USER GROUP  --> Header Record Constant
ACR|HELLO123|PROFILE1| |20151018|HELLO123
ACR|HELLO12|PROFILE2| D||HELLO12
ACR|HELLO12|PROFILE3|D||HELLO12
NUMBER OF DETAIL RECORDS:3  --> Footer Constant and should give thetotal record number

ACR is a constant value and should follow in each record first line.

Last edited by Corona688; 10-29-2015 at 12:17 PM..

karthi1305561

View Public Profile for karthi1305561

Find all posts by karthi1305561

10-29-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Please use code tags as required by forum rules!

Any attempts from your side?

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-29-2015

Registered User

8, 0

Join Date: Dec 2009

Last Activity: 20 October 2017, 3:29 PM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

UNIX/PERL script to convert XML file to pipe delimited format

I'm new to development, so i have just started the code.

karthi1305561

View Public Profile for karthi1305561

Find all posts by karthi1305561

10-29-2015

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Try

Code:

awk '
BEGIN                   {print "20151027 GLOBAL USER GROUP"
                        }
/<.?Operator_info/      {ON = (substr ($1, 2, 1) == "O")
                        }

!ON                     {next
                        }

/<ns5:.*(EnableStat|SignOn)/ ||
/<(Identif|Profile)/    {IX = toupper (substr ($1, 6, 1))
                         if (IX ~ /[IT]/) getline
                         gsub (/^<[^>]*>|<[^<]*$/, "")
                         T[IX] = $0
                         if (IX == "L")         {printf "ACR|%s|%s|%s|%s|%s\n", T["T"], T["I"], (T["E"]~/^D/)?"D":"", T["L"], T["T"]
                                                 CNT++
                                        }
                        }
END                     {print "NUMBER OF DETAIL RECORDS: ", CNT
                        }
' file
20151027 GLOBAL USER GROUP
ACR|HELLO123|PROFILE1||18/10/15|HELLO123
ACR|HELLO12|PROFILE3|D|n/a|HELLO12
NUMBER OF DETAIL RECORDS:  2

It uses the last profile found for the same identifier; handling of several profiles per identifier is not implemented.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-30-2015

Registered User

8, 0

Join Date: Dec 2009

Last Activity: 20 October 2017, 3:29 PM EDT

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks for your assistance, but i'm very much struggling to under stand the code & flow. If possible could you please explain me?

karthi1305561

View Public Profile for karthi1305561

Find all posts by karthi1305561

10-30-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Here is RudiC's script with some slight modifications:

add comments,
added tracing to show which input lines are being processed, and what data is being captured from those lines (to make it easier for you to follow what the code is doing),
capture data from multiple <Profile> tags,
look for tags that do not appear at the start of a line (needed since you didn't originally use CODE tags when you posted your sample input), and
slightly reformat the trailer to match your expected output.

Note that neither of our scripts reformat the data found with the <ns5:LastSignOn> tags to YYYYMMDD instead of DD/MM/YY format nor to change n/a to an empty string. If that is important to you, try changing the code to do that on your own. If you can't get it to work, show us what you tried and the output it produced (in CODE tags) and we'll try to help fix it.

Code:

# Use awk to run the following script with the variable trace set to 0.
awk -v trace=0 '	
# Before reading lines from the input file, print the header.
BEGIN {	print "20151027 GLOBAL USER GROUP"
}
# Look for lines containing "<Operator_info" or "</Operator_info".
/<.?Operator_info/ {
	# If the 2nd character of the 1st field is "O", set ON to 1; otherwise
	# (i.e., if the 2nd character is "/") set ON to 0.
	ON = (substr ($1, 2, 1) == "O")
}

# If on is 0 (or has not yet been set), skip to next input line and ignore the
# following sections of this script for the current line.
!ON {	next
}

# Look for lines containing:
#	"<ns5:" followed by "EnableStat" or by "SignOn"
#	"<Identif"
# or	"<Profile"
/<ns5:.*(EnableStat|SignOn)/ || /<(Identif|Profile)/ {
	# Set IX to the uppercase version of the 6th character in the 1st field:
	# i.e.,	E for <ns5:"E"nableStatus
	#	I for <Prof"i"le
	#	L for <ns5:"L"astSignOn
	# or	T for <Iden"t"ifier>
	IX = toupper (substr ($1, 6, 1))
	# If trace is set to a non-0, non-empty-string value, print the current
	# line number and contents.
	if(trace) printf("line %d:%s\n", NR, $0)
	# If IX is "I" or "T" replace the current input line with the next
	# input line and continue processing.
	if (IX ~ /[IT]/) {
		getline
		# And, if trace is set to a non-0, non-empty-string value, print
		# the current line number and contents.
		if(trace) printf("Line %d:%s\n", NR, $0)
	}
	# Throw away everying from the start of the current input line from the
	# start of the current line up to and including the 1st ">" and
	# everything from the next "<" to the end of the line.
	gsub (/^[^>]*>|<[^<]*$/, "")
	# If we are processing a <Profile> tag increment the number of <Profile>
	# tags we have seen and save the remaining data from the current line
	# in the array T with the subscript being the number of <Profile> tags
	# we have seen, otherwise, save the remaining data from the current line
	# in the array T with the subscript being the current value saved in IX.
	if(IX == "I")
		T[++pcnt] = $0
	else	T[IX] = $0
	# If trace is set to a non-0, non-empty-string value, print the array
	# elemnt we just initialized.
	if(trace) printf("T[%s]=%s\n", (IX=="I") ? pcnt : IX, $0)
	# If we are processing an <ns5:LastSignOn> tag, print the results we
	# have accumulated for this <Identifier tag.
	if(IX == "L") {
		# Print one line for each <Profile> tag we have seen.
		for(i = 1; i <= pcnt; i++) {
			# Note that if the data saved for the <ns5:EnableStatus>
			# flag was "DISABLED" (or, actually, started with "D"),
			# print "D" for that field; otherwise, print a <space>
			# for that field.
			printf("ACR|%s|%s|%s|%s|%s\n",
			    T["T"], T[i], (T["E"]~/^D/)?"D":" ", T["L"], T["T"])
			# Increment the number of detail records we have
			# printed.
			CNT++
		}
		# Clear the number of <Profile> tags we have seen.
		pcnt = 0
	}
}

# When we hit end-of-file on the last input file, print the trailer line.
END {	print "NUMBER OF DETAIL RECORDS:" CNT
}
# End the script to be run by awk and list the input files to be processed.
' file

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

When run as shown above (with tracing turned off) and with the sample data you supplied being contained in a file named file, it produces the output:

Code:

20151027 GLOBAL USER GROUP
ACR|HELLO123|PROFILE1| |18/10/15|HELLO123
ACR|HELLO12|PROFILE2|D|n/a|HELLO12
ACR|HELLO12|PROFILE3|D|n/a|HELLO12
NUMBER OF DETAIL RECORDS:3

If you change the line:

Code:

awk -v trace=0 '

to:

Code:

awk -v trace=1 '

to enable tracing, it produces the output:

Code:

20151027 GLOBAL USER GROUP
line 7:        <Identifier>
Line 8:            <Name>HELLO123</Name>
T[T]=HELLO123
line 13:        <Profile>
Line 14:            <ns3:Name>PROFILE1</ns3:Name>
T[1]=PROFILE1
line 20:    <ns5:EnableStatus>ENABLED</ns5:EnableStatus>
T[E]=ENABLED
line 24:    <ns5:LastSignOn>18/10/15</ns5:LastSignOn>
T[L]=18/10/15
ACR|HELLO123|PROFILE1| |18/10/15|HELLO123
line 29:        <Identifier>
Line 30:            <Name>HELLO12</Name>
T[T]=HELLO12
line 35:        <Profile>
Line 36:            <ns3:Name>PROFILE2</ns3:Name>
T[1]=PROFILE2
line 38:          <Profile>
Line 39:            <ns3:Name>PROFILE3</ns3:Name>
T[2]=PROFILE3
line 45:    <ns5:EnableStatus>DISABLED</ns5:EnableStatus>
T[E]=DISABLED
line 49:    <ns5:LastSignOn>n/a</ns5:LastSignOn>
T[L]=n/a
ACR|HELLO12|PROFILE2|D|n/a|HELLO12
ACR|HELLO12|PROFILE3|D|n/a|HELLO12
NUMBER OF DETAIL RECORDS:3

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-01-2015

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Code:

#!/usr/bin/perl

use strict;
use warnings;

# get the xml to work on from command line
my $fname = $ARGV[0] or die $!;

# open xml for reading
open my $in, '<', $fname or die $!;

# to keep detail record count
my $count = 0;

# translator for disabled and enabled status
my %status = ("DISABLED" => "D", "ENABLED" => " ",);

# custom block separator
$/ = "<\/ns5:OperatorDefn>";

# display header record constant
print "20151027 GLOBAL USER GROUP\n";

# start processing chunks from the xml file
while(<$in>) {
     my %record = (); # to catalog record details

     # collect only the wanted details
     while(/<((?:ns[35]:)?(?:Name|EnableStatus|LastSignOn))>(.*)<\/\1>/g){

         # add to catalog in the current loop
         push @{$record{$1}}, $2;
     }
     # display a piped-formatted record for each profile found
     for my $profile (@{$record{"ns3:Name"}}){
         # status n/a gets translated to empty display
         $record{"ns5:LastSignOn"}->[0] = "" if $record{"ns5:LastSignOn"}->[0] eq "n/a";

         # produce the formatted record
         printf "ACR|%s|%s|%s|%s|%s\n", $record{"Name"}->[0],
                                        $profile,
                                        $status{$record{"ns5:EnableStatus"}->[0]},
                                        $record{"ns5:LastSignOn"}->[0],
                                        $record{"Name"}->[0];
        $count++;  # another record processed
     }
}
close $in; # dismiss the xml file handle

# display footer constant tally
print "NUMBER OF DETAIL RECORDS: $count\n";

Save as karthi.pl
Run as perl karthi.pl karthi.xml

Last edited by Aia; 11-01-2015 at 11:00 AM.. Reason: changes a comment

Aia

View Public Profile for Aia

Find all posts by Aia

Shell Programming and Scripting

UNIX/PERL script to convert XML file to pipe delimited format

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Linux convert Comma delimited file to pipe

Discussion started by: shieksir

2. Shell Programming and Scripting

Convert pipe demilited file to vertical tab delimited

Discussion started by: prasson_ibm

3. UNIX for Advanced & Expert Users

Convert CSV file to nested XML file using UNIX/PERL?

Discussion started by: laknar

4. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Discussion started by: raja kakitapall

5. Shell Programming and Scripting

How to convert a space delimited file into a pipe delimited file using shellscript?

Discussion started by: nithins007

6. Shell Programming and Scripting

Convert CSV file (with double quoted strings) to pipe delimited file

Discussion started by: Ram.Math

7. UNIX for Dummies Questions & Answers

How to convert a text file into tab delimited format?

Discussion started by: evelibertine

8. UNIX for Advanced & Expert Users

Urgent! need help! how to convert this file into comma delimited format

Discussion started by: natalie23

9. Shell Programming and Scripting

how to convert this file into comma delimited format

Discussion started by: natalie23

10. Shell Programming and Scripting

convert a pipe delimited file to a':" delimited file

Discussion started by: priyanka3006