UNIX/PERL script to convert XML file to pipe delimited format


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting UNIX/PERL script to convert XML file to pipe delimited format
# 1  
UNIX/PERL script to convert XML file to pipe delimited format

Hello, I need to get few values from a XML file and output needs to be written in another file with pipe delimited format. The Header & Footer of the Pipe Delimited file will be constant.

The below is my sample XML file. I need to pull the values in between the XML tags <Operator_info to </Operator_info>. The values are NAME, PROFILE, ENABLE STATUS (IF status is Enabled leave Blank and DISABLED write first character D) and LASTSIGNON Date.

XML INPUT FILE:

Code:
<?xml version="1.0" encoding="UTF-8"?>
<OpList xmlns="urn:swift:saa:xsd:extractor">
<Operator_info xmlns="urn:swift:saa:xsd:extractor">
<!-- *** Extracted Data for Operators *** -->
 <ns5:OperatorDefn xmlns="urn:swift:saa:xsd:operator" xmlns:ns2="urn:swift:saa:xsd:authenticationservergroup" xmlns:ns3="urn:swift:saa:xsd:operatorprofile" xmlns:ns4="urn:swift:saa:xsd:unit" xmlns:ns5="urn:swift:saa:xsd:extractor" xmlns:ns6="urn:swift:saa:xsd:licenseddestination">
    <ns5:Operator>
        <Identifier>
            <Name>HELLO123</Name>
        </Identifier>
        <Description>Virat, Kholi</Description>
        <OperatorType>HUMAN</OperatorType>
        <AuthenticationType>LOCAL</AuthenticationType>
        <Profile>
            <ns3:Name>PROFILE1</ns3:Name>
        </Profile>
        <Unit>
            <ns4:Name>None</ns4:Name>
        </Unit>
        </ns5:Operator>
    <ns5:EnableStatus>ENABLED</ns5:EnableStatus>
    <ns5:ReEnableDate>n/a</ns5:ReEnableDate>
    <ns5:ApprovalStatus>APPROVED</ns5:ApprovalStatus>
    <ns5:LastChanged>25/07/14 20:15:35</ns5:LastChanged>
    <ns5:LastSignOn>18/10/15</ns5:LastSignOn>
    <ns5:LastEnabled>18/01/12 15:27:13</ns5:LastEnabled>
</ns5:OperatorDefn>
 <ns5:OperatorDefn xmlns="urn:swift:saa:xsd:operator" xmlns:ns2="urn:swift:saa:xsd:authenticationservergroup" xmlns:ns3="urn:swift:saa:xsd:operatorprofile" xmlns:ns4="urn:swift:saa:xsd:unit" xmlns:ns5="urn:swift:saa:xsd:extractor" xmlns:ns6="urn:swift:saa:xsd:licenseddestination">
    <ns5:Operator>
        <Identifier>
            <Name>HELLO12</Name>
        </Identifier>
        <Description>SACHIN,TEN</Description>
        <OperatorType>HUMAN</OperatorType>
        <AuthenticationType>LOCAL</AuthenticationType>
        <Profile>
            <ns3:Name>PROFILE2</ns3:Name>
        </Profile>
          <Profile>
            <ns3:Name>PROFILE3</ns3:Name>
        </Profile>
               <Unit>
            <ns4:Name>None</ns4:Name>
        </Unit>
       </ns5:Operator>
    <ns5:EnableStatus>DISABLED</ns5:EnableStatus>
    <ns5:ReEnableDate>n/a</ns5:ReEnableDate>
    <ns5:ApprovalStatus>APPROVED</ns5:ApprovalStatus>
    <ns5:LastChanged>14/02/12 17:34:35</ns5:LastChanged>
    <ns5:LastSignOn>n/a</ns5:LastSignOn>
    <ns5:LastEnabled>18/01/12 15:26:55</ns5:LastEnabled>
</ns5:OperatorDefn>
</Operator_info>

Expected Output:
Code:
20151027 GLOBAL USER GROUP  --> Header Record Constant
ACR|HELLO123|PROFILE1| |20151018|HELLO123
ACR|HELLO12|PROFILE2| D||HELLO12
ACR|HELLO12|PROFILE3|D||HELLO12
NUMBER OF DETAIL RECORDS:3  --> Footer Constant and should give thetotal record number

ACR is a constant value and should follow in each record first line.

Last edited by Corona688; 10-29-2015 at 12:17 PM..
# 2  
Please use code tags as required by forum rules!

Any attempts from your side?
# 3  
UNIX/PERL script to convert XML file to pipe delimited format

I'm new to development, so i have just started the code.
# 4  
Try
Code:
awk '
BEGIN                   {print "20151027 GLOBAL USER GROUP"
                        }
/<.?Operator_info/      {ON = (substr ($1, 2, 1) == "O")
                        }

!ON                     {next
                        }

/<ns5:.*(EnableStat|SignOn)/ ||
/<(Identif|Profile)/    {IX = toupper (substr ($1, 6, 1))
                         if (IX ~ /[IT]/) getline
                         gsub (/^<[^>]*>|<[^<]*$/, "")
                         T[IX] = $0
                         if (IX == "L")         {printf "ACR|%s|%s|%s|%s|%s\n", T["T"], T["I"], (T["E"]~/^D/)?"D":"", T["L"], T["T"]
                                                 CNT++
                                        }
                        }
END                     {print "NUMBER OF DETAIL RECORDS: ", CNT
                        }
' file
20151027 GLOBAL USER GROUP
ACR|HELLO123|PROFILE1||18/10/15|HELLO123
ACR|HELLO12|PROFILE3|D|n/a|HELLO12
NUMBER OF DETAIL RECORDS:  2

It uses the last profile found for the same identifier; handling of several profiles per identifier is not implemented.
# 5  
Thanks for your assistance, but i'm very much struggling to under stand the code & flow. If possible could you please explain me?
# 6  
Here is RudiC's script with some slight modifications:
  • add comments,
  • added tracing to show which input lines are being processed, and what data is being captured from those lines (to make it easier for you to follow what the code is doing),
  • capture data from multiple <Profile> tags,
  • look for tags that do not appear at the start of a line (needed since you didn't originally use CODE tags when you posted your sample input), and
  • slightly reformat the trailer to match your expected output.
Note that neither of our scripts reformat the data found with the <ns5:LastSignOn> tags to YYYYMMDD instead of DD/MM/YY format nor to change n/a to an empty string. If that is important to you, try changing the code to do that on your own. If you can't get it to work, show us what you tried and the output it produced (in CODE tags) and we'll try to help fix it.
Code:
# Use awk to run the following script with the variable trace set to 0.
awk -v trace=0 '	
# Before reading lines from the input file, print the header.
BEGIN {	print "20151027 GLOBAL USER GROUP"
}
# Look for lines containing "<Operator_info" or "</Operator_info".
/<.?Operator_info/ {
	# If the 2nd character of the 1st field is "O", set ON to 1; otherwise
	# (i.e., if the 2nd character is "/") set ON to 0.
	ON = (substr ($1, 2, 1) == "O")
}

# If on is 0 (or has not yet been set), skip to next input line and ignore the
# following sections of this script for the current line.
!ON {	next
}

# Look for lines containing:
#	"<ns5:" followed by "EnableStat" or by "SignOn"
#	"<Identif"
# or	"<Profile"
/<ns5:.*(EnableStat|SignOn)/ || /<(Identif|Profile)/ {
	# Set IX to the uppercase version of the 6th character in the 1st field:
	# i.e.,	E for <ns5:"E"nableStatus
	#	I for <Prof"i"le
	#	L for <ns5:"L"astSignOn
	# or	T for <Iden"t"ifier>
	IX = toupper (substr ($1, 6, 1))
	# If trace is set to a non-0, non-empty-string value, print the current
	# line number and contents.
	if(trace) printf("line %d:%s\n", NR, $0)
	# If IX is "I" or "T" replace the current input line with the next
	# input line and continue processing.
	if (IX ~ /[IT]/) {
		getline
		# And, if trace is set to a non-0, non-empty-string value, print
		# the current line number and contents.
		if(trace) printf("Line %d:%s\n", NR, $0)
	}
	# Throw away everying from the start of the current input line from the
	# start of the current line up to and including the 1st ">" and
	# everything from the next "<" to the end of the line.
	gsub (/^[^>]*>|<[^<]*$/, "")
	# If we are processing a <Profile> tag increment the number of <Profile>
	# tags we have seen and save the remaining data from the current line
	# in the array T with the subscript being the number of <Profile> tags
	# we have seen, otherwise, save the remaining data from the current line
	# in the array T with the subscript being the current value saved in IX.
	if(IX == "I")
		T[++pcnt] = $0
	else	T[IX] = $0
	# If trace is set to a non-0, non-empty-string value, print the array
	# elemnt we just initialized.
	if(trace) printf("T[%s]=%s\n", (IX=="I") ? pcnt : IX, $0)
	# If we are processing an <ns5:LastSignOn> tag, print the results we
	# have accumulated for this <Identifier tag.
	if(IX == "L") {
		# Print one line for each <Profile> tag we have seen.
		for(i = 1; i <= pcnt; i++) {
			# Note that if the data saved for the <ns5:EnableStatus>
			# flag was "DISABLED" (or, actually, started with "D"),
			# print "D" for that field; otherwise, print a <space>
			# for that field.
			printf("ACR|%s|%s|%s|%s|%s\n",
			    T["T"], T[i], (T["E"]~/^D/)?"D":" ", T["L"], T["T"])
			# Increment the number of detail records we have
			# printed.
			CNT++
		}
		# Clear the number of <Profile> tags we have seen.
		pcnt = 0
	}
}

# When we hit end-of-file on the last input file, print the trailer line.
END {	print "NUMBER OF DETAIL RECORDS:" CNT
}
# End the script to be run by awk and list the input files to be processed.
' file

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

When run as shown above (with tracing turned off) and with the sample data you supplied being contained in a file named file, it produces the output:
Code:
20151027 GLOBAL USER GROUP
ACR|HELLO123|PROFILE1| |18/10/15|HELLO123
ACR|HELLO12|PROFILE2|D|n/a|HELLO12
ACR|HELLO12|PROFILE3|D|n/a|HELLO12
NUMBER OF DETAIL RECORDS:3

If you change the line:
Code:
awk -v trace=0 '

to:
Code:
awk -v trace=1 '

to enable tracing, it produces the output:
Code:
20151027 GLOBAL USER GROUP
line 7:        <Identifier>
Line 8:            <Name>HELLO123</Name>
T[T]=HELLO123
line 13:        <Profile>
Line 14:            <ns3:Name>PROFILE1</ns3:Name>
T[1]=PROFILE1
line 20:    <ns5:EnableStatus>ENABLED</ns5:EnableStatus>
T[E]=ENABLED
line 24:    <ns5:LastSignOn>18/10/15</ns5:LastSignOn>
T[L]=18/10/15
ACR|HELLO123|PROFILE1| |18/10/15|HELLO123
line 29:        <Identifier>
Line 30:            <Name>HELLO12</Name>
T[T]=HELLO12
line 35:        <Profile>
Line 36:            <ns3:Name>PROFILE2</ns3:Name>
T[1]=PROFILE2
line 38:          <Profile>
Line 39:            <ns3:Name>PROFILE3</ns3:Name>
T[2]=PROFILE3
line 45:    <ns5:EnableStatus>DISABLED</ns5:EnableStatus>
T[E]=DISABLED
line 49:    <ns5:LastSignOn>n/a</ns5:LastSignOn>
T[L]=n/a
ACR|HELLO12|PROFILE2|D|n/a|HELLO12
ACR|HELLO12|PROFILE3|D|n/a|HELLO12
NUMBER OF DETAIL RECORDS:3

# 7  
Code:
#!/usr/bin/perl

use strict;
use warnings;

# get the xml to work on from command line
my $fname = $ARGV[0] or die $!;

# open xml for reading
open my $in, '<', $fname or die $!;

# to keep detail record count
my $count = 0;

# translator for disabled and enabled status
my %status = ("DISABLED" => "D", "ENABLED" => " ",);

# custom block separator
$/ = "<\/ns5:OperatorDefn>";

# display header record constant
print "20151027 GLOBAL USER GROUP\n";

# start processing chunks from the xml file
while(<$in>) {
     my %record = (); # to catalog record details

     # collect only the wanted details
     while(/<((?:ns[35]:)?(?:Name|EnableStatus|LastSignOn))>(.*)<\/\1>/g){

         # add to catalog in the current loop
         push @{$record{$1}}, $2;
     }
     # display a piped-formatted record for each profile found
     for my $profile (@{$record{"ns3:Name"}}){
         # status n/a gets translated to empty display
         $record{"ns5:LastSignOn"}->[0] = "" if $record{"ns5:LastSignOn"}->[0] eq "n/a";

         # produce the formatted record
         printf "ACR|%s|%s|%s|%s|%s\n", $record{"Name"}->[0],
                                        $profile,
                                        $status{$record{"ns5:EnableStatus"}->[0]},
                                        $record{"ns5:LastSignOn"}->[0],
                                        $record{"Name"}->[0];
        $count++;  # another record processed
     }
}
close $in; # dismiss the xml file handle

# display footer constant tally
print "NUMBER OF DETAIL RECORDS: $count\n";

Save as karthi.pl
Run as perl karthi.pl karthi.xml

Last edited by Aia; 11-01-2015 at 11:00 AM.. Reason: changes a comment
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #670
Difficulty: Medium
Fourteen channels are designated in the 2.4 GHz range, spaced 5 MHz apart from each other except for a 12 MHz space before channel 14.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Linux convert Comma delimited file to pipe

I have file in linux with comma delimited and string fields in double quotations ", I need to convert them to pipe delimiter please share your inputs. Example: Input: "2017-09-30","ACBD,TVF","01234",NULL,18,NULL,"686091802","BANK OF ABCD, LIMITED, THE",790456 Output: ... (4 Replies)
Discussion started by: shieksir
4 Replies

2. Shell Programming and Scripting

Convert pipe demilited file to vertical tab delimited

Hi All, How can we convert pipe delimited ( or comma ) file to vertical tab (VT) delimited. Regards PK (4 Replies)
Discussion started by: prasson_ibm
4 Replies

3. UNIX for Advanced & Expert Users

Convert CSV file to nested XML file using UNIX/PERL?

we have a CSV which i need to convert to XML using Perl or Unix shell scripting. I was able to build this XML in oracle database. However, SQL/XML query is running for long time. Hence, I'm considering to write a Perl or shell script to generate this XML file. Basically need to build this XML... (3 Replies)
Discussion started by: laknar
3 Replies

4. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

5. Shell Programming and Scripting

How to convert a space delimited file into a pipe delimited file using shellscript?

Hi All, I have space delimited file similar to the one as shown below.. I need to convert it as a pipe delimited, the values inside the pipe delimited file should be as highlighted... AA ATIU2345098809 009697 005374 BB ATIU2345097809 005445 006518 CC ATIU9685098809 003215 003571 DD... (7 Replies)
Discussion started by: nithins007
7 Replies

6. Shell Programming and Scripting

Convert CSV file (with double quoted strings) to pipe delimited file

Hi, could some help me convert CSV file (with double quoted strings) to pipe delimited file: here you go with the same data: 1,Friends,"$3.99 per 1,000 listings",8158here " 1,000 listings " should be a single field. Thanks, Ram (8 Replies)
Discussion started by: Ram.Math
8 Replies

7. UNIX for Dummies Questions & Answers

How to convert a text file into tab delimited format?

I have a text file that made using text editor in Ubuntu. However the text file is not being recognized as space or tab delimited, the formatting seems to be messed up. How can I convert the text file into tab delimited format? (3 Replies)
Discussion started by: evelibertine
3 Replies

8. UNIX for Advanced & Expert Users

Urgent! need help! how to convert this file into comma delimited format

Hi experts, I need urget help! I have the a text file with this format: Types of fruits Name of fruits 1,1 Farm_no,1 apple,1 pineapple,1 grapes,1 orange,1 banana,1 2,2--->this is the record seperator Farm_no,2 apple,1 pineapple,1 grapes,3 orange,2 banana,1 3,3--->this is the... (2 Replies)
Discussion started by: natalie23
2 Replies

9. Shell Programming and Scripting

how to convert this file into comma delimited format

Hi experts, I need urget help! I have the a text file with this format: Types of fruits Name of fruits 1,1 Farm_no,1 apple,1 pineapple,1 grapes,1 orange,1 banana,1 2,2--->this is the record seperator Farm_no,2 apple,1 pineapple,1 grapes,3 orange,2 banana,1 3,3--->this is the... (1 Reply)
Discussion started by: natalie23
1 Replies

10. Shell Programming and Scripting

convert a pipe delimited file to a':" delimited file

i have a file whose data is like this:: osr_pe_assign|-120|wg000d@att.com|4| osr_evt|-21|wg000d@att.com|4| pe_avail|-21|wg000d@att.com|4| osr_svt|-11|wg000d@att.com|4| pe_mop|-13|wg000d@att.com|4| instar_ready|-35|wg000d@att.com|4| nsdnet_ready|-90|wg000d@att.com|4|... (6 Replies)
Discussion started by: priyanka3006
6 Replies

Featured Tech Videos