Sponsored Content
Top Forums Shell Programming and Scripting UNIX/PERL script to convert XML file to pipe delimited format Post 302959281 by Don Cragun on Friday 30th of October 2015 09:31:22 PM
Old 10-30-2015
Here is RudiC's script with some slight modifications:
  • add comments,
  • added tracing to show which input lines are being processed, and what data is being captured from those lines (to make it easier for you to follow what the code is doing),
  • capture data from multiple <Profile> tags,
  • look for tags that do not appear at the start of a line (needed since you didn't originally use CODE tags when you posted your sample input), and
  • slightly reformat the trailer to match your expected output.
Note that neither of our scripts reformat the data found with the <ns5:LastSignOn> tags to YYYYMMDD instead of DD/MM/YY format nor to change n/a to an empty string. If that is important to you, try changing the code to do that on your own. If you can't get it to work, show us what you tried and the output it produced (in CODE tags) and we'll try to help fix it.
Code:
# Use awk to run the following script with the variable trace set to 0.
awk -v trace=0 '	
# Before reading lines from the input file, print the header.
BEGIN {	print "20151027 GLOBAL USER GROUP"
}
# Look for lines containing "<Operator_info" or "</Operator_info".
/<.?Operator_info/ {
	# If the 2nd character of the 1st field is "O", set ON to 1; otherwise
	# (i.e., if the 2nd character is "/") set ON to 0.
	ON = (substr ($1, 2, 1) == "O")
}

# If on is 0 (or has not yet been set), skip to next input line and ignore the
# following sections of this script for the current line.
!ON {	next
}

# Look for lines containing:
#	"<ns5:" followed by "EnableStat" or by "SignOn"
#	"<Identif"
# or	"<Profile"
/<ns5:.*(EnableStat|SignOn)/ || /<(Identif|Profile)/ {
	# Set IX to the uppercase version of the 6th character in the 1st field:
	# i.e.,	E for <ns5:"E"nableStatus
	#	I for <Prof"i"le
	#	L for <ns5:"L"astSignOn
	# or	T for <Iden"t"ifier>
	IX = toupper (substr ($1, 6, 1))
	# If trace is set to a non-0, non-empty-string value, print the current
	# line number and contents.
	if(trace) printf("line %d:%s\n", NR, $0)
	# If IX is "I" or "T" replace the current input line with the next
	# input line and continue processing.
	if (IX ~ /[IT]/) {
		getline
		# And, if trace is set to a non-0, non-empty-string value, print
		# the current line number and contents.
		if(trace) printf("Line %d:%s\n", NR, $0)
	}
	# Throw away everying from the start of the current input line from the
	# start of the current line up to and including the 1st ">" and
	# everything from the next "<" to the end of the line.
	gsub (/^[^>]*>|<[^<]*$/, "")
	# If we are processing a <Profile> tag increment the number of <Profile>
	# tags we have seen and save the remaining data from the current line
	# in the array T with the subscript being the number of <Profile> tags
	# we have seen, otherwise, save the remaining data from the current line
	# in the array T with the subscript being the current value saved in IX.
	if(IX == "I")
		T[++pcnt] = $0
	else	T[IX] = $0
	# If trace is set to a non-0, non-empty-string value, print the array
	# elemnt we just initialized.
	if(trace) printf("T[%s]=%s\n", (IX=="I") ? pcnt : IX, $0)
	# If we are processing an <ns5:LastSignOn> tag, print the results we
	# have accumulated for this <Identifier tag.
	if(IX == "L") {
		# Print one line for each <Profile> tag we have seen.
		for(i = 1; i <= pcnt; i++) {
			# Note that if the data saved for the <ns5:EnableStatus>
			# flag was "DISABLED" (or, actually, started with "D"),
			# print "D" for that field; otherwise, print a <space>
			# for that field.
			printf("ACR|%s|%s|%s|%s|%s\n",
			    T["T"], T[i], (T["E"]~/^D/)?"D":" ", T["L"], T["T"])
			# Increment the number of detail records we have
			# printed.
			CNT++
		}
		# Clear the number of <Profile> tags we have seen.
		pcnt = 0
	}
}

# When we hit end-of-file on the last input file, print the trailer line.
END {	print "NUMBER OF DETAIL RECORDS:" CNT
}
# End the script to be run by awk and list the input files to be processed.
' file

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

When run as shown above (with tracing turned off) and with the sample data you supplied being contained in a file named file, it produces the output:
Code:
20151027 GLOBAL USER GROUP
ACR|HELLO123|PROFILE1| |18/10/15|HELLO123
ACR|HELLO12|PROFILE2|D|n/a|HELLO12
ACR|HELLO12|PROFILE3|D|n/a|HELLO12
NUMBER OF DETAIL RECORDS:3

If you change the line:
Code:
awk -v trace=0 '

to:
Code:
awk -v trace=1 '

to enable tracing, it produces the output:
Code:
20151027 GLOBAL USER GROUP
line 7:        <Identifier>
Line 8:            <Name>HELLO123</Name>
T[T]=HELLO123
line 13:        <Profile>
Line 14:            <ns3:Name>PROFILE1</ns3:Name>
T[1]=PROFILE1
line 20:    <ns5:EnableStatus>ENABLED</ns5:EnableStatus>
T[E]=ENABLED
line 24:    <ns5:LastSignOn>18/10/15</ns5:LastSignOn>
T[L]=18/10/15
ACR|HELLO123|PROFILE1| |18/10/15|HELLO123
line 29:        <Identifier>
Line 30:            <Name>HELLO12</Name>
T[T]=HELLO12
line 35:        <Profile>
Line 36:            <ns3:Name>PROFILE2</ns3:Name>
T[1]=PROFILE2
line 38:          <Profile>
Line 39:            <ns3:Name>PROFILE3</ns3:Name>
T[2]=PROFILE3
line 45:    <ns5:EnableStatus>DISABLED</ns5:EnableStatus>
T[E]=DISABLED
line 49:    <ns5:LastSignOn>n/a</ns5:LastSignOn>
T[L]=n/a
ACR|HELLO12|PROFILE2|D|n/a|HELLO12
ACR|HELLO12|PROFILE3|D|n/a|HELLO12
NUMBER OF DETAIL RECORDS:3

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

convert a pipe delimited file to a':" delimited file

i have a file whose data is like this:: osr_pe_assign|-120|wg000d@att.com|4| osr_evt|-21|wg000d@att.com|4| pe_avail|-21|wg000d@att.com|4| osr_svt|-11|wg000d@att.com|4| pe_mop|-13|wg000d@att.com|4| instar_ready|-35|wg000d@att.com|4| nsdnet_ready|-90|wg000d@att.com|4|... (6 Replies)
Discussion started by: priyanka3006
6 Replies

2. Shell Programming and Scripting

how to convert this file into comma delimited format

Hi experts, I need urget help! I have the a text file with this format: Types of fruits Name of fruits 1,1 Farm_no,1 apple,1 pineapple,1 grapes,1 orange,1 banana,1 2,2--->this is the record seperator Farm_no,2 apple,1 pineapple,1 grapes,3 orange,2 banana,1 3,3--->this is the... (1 Reply)
Discussion started by: natalie23
1 Replies

3. UNIX for Advanced & Expert Users

Urgent! need help! how to convert this file into comma delimited format

Hi experts, I need urget help! I have the a text file with this format: Types of fruits Name of fruits 1,1 Farm_no,1 apple,1 pineapple,1 grapes,1 orange,1 banana,1 2,2--->this is the record seperator Farm_no,2 apple,1 pineapple,1 grapes,3 orange,2 banana,1 3,3--->this is the... (2 Replies)
Discussion started by: natalie23
2 Replies

4. UNIX for Dummies Questions & Answers

How to convert a text file into tab delimited format?

I have a text file that made using text editor in Ubuntu. However the text file is not being recognized as space or tab delimited, the formatting seems to be messed up. How can I convert the text file into tab delimited format? (3 Replies)
Discussion started by: evelibertine
3 Replies

5. Shell Programming and Scripting

Convert CSV file (with double quoted strings) to pipe delimited file

Hi, could some help me convert CSV file (with double quoted strings) to pipe delimited file: here you go with the same data: 1,Friends,"$3.99 per 1,000 listings",8158here " 1,000 listings " should be a single field. Thanks, Ram (8 Replies)
Discussion started by: Ram.Math
8 Replies

6. Shell Programming and Scripting

How to convert a space delimited file into a pipe delimited file using shellscript?

Hi All, I have space delimited file similar to the one as shown below.. I need to convert it as a pipe delimited, the values inside the pipe delimited file should be as highlighted... AA ATIU2345098809 009697 005374 BB ATIU2345097809 005445 006518 CC ATIU9685098809 003215 003571 DD... (7 Replies)
Discussion started by: nithins007
7 Replies

7. UNIX for Dummies Questions & Answers

Need to convert a pipe delimited text file to tab delimited

Hi, I have a rquirement in unix as below . I have a text file with me seperated by | symbol and i need to generate a excel file through unix commands/script so that each value will go to each column. ex: Input Text file: 1|A|apple 2|B|bottle excel file to be generated as output as... (9 Replies)
Discussion started by: raja kakitapall
9 Replies

8. UNIX for Advanced & Expert Users

Convert CSV file to nested XML file using UNIX/PERL?

we have a CSV which i need to convert to XML using Perl or Unix shell scripting. I was able to build this XML in oracle database. However, SQL/XML query is running for long time. Hence, I'm considering to write a Perl or shell script to generate this XML file. Basically need to build this XML... (3 Replies)
Discussion started by: laknar
3 Replies

9. Shell Programming and Scripting

Convert pipe demilited file to vertical tab delimited

Hi All, How can we convert pipe delimited ( or comma ) file to vertical tab (VT) delimited. Regards PK (4 Replies)
Discussion started by: prasson_ibm
4 Replies

10. Shell Programming and Scripting

Linux convert Comma delimited file to pipe

I have file in linux with comma delimited and string fields in double quotations ", I need to convert them to pipe delimiter please share your inputs. Example: Input: "2017-09-30","ACBD,TVF","01234",NULL,18,NULL,"686091802","BANK OF ABCD, LIMITED, THE",790456 Output: ... (4 Replies)
Discussion started by: shieksir
4 Replies
All times are GMT -4. The time now is 01:55 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy