How to extract xml attribute values using awk inline.?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract xml attribute values using awk inline.?
# 8  
Old 06-05-2016
Using the following bash script on OS X El Capitan (version 10.11.5) with a 2.8 GHz Intel Core i7 (4 core) processor and a 1TB SSD holding my data and code, the following script:
Code:
#!/bin/bash
printf 'perl results:\n'
time perl -nle '/<factories.*baseQueueName/ and @r=/(?:jndiN|baseQueueN|n)ame="([^"]+)/g and print join ",",@r[1,2,0]' resources.xml

printf '\nawk results:\n'
time awk '
BEGIN {	# Define ERE to match double-quoted string.
	dqsERE = "\"[^\"]*\""

	# Construct array of extended regular expression to match attributes...
	# First, the attribute name...
	EREs[++nEREs] = " jndiName="
	EREs[++nEREs] = " baseQueueName="
	EREs[++nEREs] = " name="

	# Save the lengths of the attribute names and add an ERE to match the
	# double-qouted string following the attribute name.
	for(i = 1; i <= nEREs; i++) {
		offset[i] = length(EREs[i]) + 1
		EREs[i] = EREs[i] dqsERE
	}
}
/<factories.*baseQueueName/ {
	# We have an XML line to process.
	# Clear the output string.
	out = ""
	for(i = 1; i <= nEREs; i++t) {
		# For each desired attribute, look for a match...
		if(match($0, EREs[i]))
			# A match was found for this attribute, add the data
			# from the double-quoted string to the output string.
			out = out substr($0, RSTART + offset[i],
				RLENGTH - offset[i] - 1)
		# Whether or not a match was found, add a field separator to
		# the output string.
		out = out ((i < nEREs) ? "," : "")
	}
	# Print the accumulated output string.
	print out
}' resources.xml

printf '\nOriginal script results:\n'
time {	grep '<factories.*baseQueueName' resources.xml | while read line; do
	QUEUE_JNDI_NAME=$( echo $line | grep -o 'jndiName="[^"]*"'      | cut -d'"' -f2 )
	BASE_QUEUE_NAME=$( echo $line | grep -o 'baseQueueName="[^"]*"' | cut -d'"' -f2 )
	QUEUE_NAME=$(      echo $line | grep -o 'name="[^"]*"'          | cut -d'"' -f2 )
	echo "$QUEUE_JNDI_NAME,$BASE_QUEUE_NAME,$QUEUE_NAME"
	done
}

produces output with the average times (from 10 runs):
Code:
perl results:
jms/Queue1,TEST.QUEUE1,Queue1
jms/Queue2,TEST.QUEUE2,Queue2

real	0m0.007s
user	0m0.002s
sys	0m0.003s

awk results:
jms/Queue1,TEST.QUEUE1,Queue1
jms/Queue2,TEST.QUEUE2,Queue2

real	0m0.002s
user	0m0.001s
sys	0m0.001s

Original script results:
jms/Queue1,TEST.QUEUE1,Queue1
jms/Queue2,TEST.QUEUE2,Queue2

real	0m0.017s
user	0m0.011s
sys	0m0.016s

Note that grep on OS X does not have a -P option, so I had to modify your script to use basic REs instead of perl REs.

Note that even with commented awk code, my awk script runs in 1/3 the time needed for Aia's perl script (with the grep folded into the perl script).

Could we assume that you didn't time the grep | perl pipeline, but instead just timed the perl script that did not select only lines matching the pattern <factories.*baseQueueName; or is awk really that much slower on your system compared to perl?
# 9  
Old 06-05-2016
I have SuSe Linux 11 VM with 8GM RAM(300 MB free) 2 CPU cores.

Code:
# XML has 151 matching xml nodes(lines)
 perl -nle '/<factories.*baseQueueName/ and @r=/(?:jndiN|baseQueueN|n)ame="([^"]+)/g and print join ",",@r[1,2,0]' resources.xml | wc -l
151

# I did two tests with the same script you tried on your end.
perl results:
real    0m0.008s	real    0m0.008s  
user    0m0.007s  	user    0m0.007s
sys     0m0.002s	sys     0m0.002s

awk results:
real    0m0.066s  	real    0m0.026s
user    0m0.026s    user    0m0.007s
sys     0m0.001s	sys     0m0.002s

Original script results:
real    0m0.997s    real    0m1.102s
user    0m0.621s    user    0m0.644s
sys     0m0.680s	sys     0m0.690s


Last edited by kchinnam; 06-05-2016 at 02:54 AM.. Reason: text formatting
# 10  
Old 06-05-2016
It is interesting to note that on your two runs, the timings for the two perl runs were similar and the timing for the two bash, grep, cut runs were similar, but the awk timings were radically different. It is also interesting to note that on the 2nd awk run, the user and sys times were identical to the perl user and sys times, but the elapsed time was grossly longer for awk. Were you running your timing tests on an otherwise idle system?

What timing results do you get running this stripped down awk code a few times:
Code:
#!/bin/bash
printf 'awk results:\n'
time awk 'BEGIN{d="\"[^\"]*";E[1]=" jndiName=";E[2]=" baseQueueName=";E[n=3]=" name=";for(i=1;i<=n;i++){O[i]=length(E[i])+1;E[i]=E[i]d}}/<factories.*baseQueueName/{o="";for(i=1;i<=n;i++){if(match($0,E[i]))o=o substr($0,RSTART+O[i],RLENGTH-O[i]);o=o ((i<n)?",":"")}print o}' resources.xml

# 11  
Old 06-05-2016
Don,
With awk I am noticing wild swings within few minutes gap. At the same time perl solution performance is pretty consistent with little variation.

I am dropping my original solution from contest. So I ran below three solutions,, multiple times also after few minutes gap.. Here are the ruff averages I am seeing..
It can't be IO since awk && perl are printing to console..

Code:
time awk 'BEGIN{d="\"[^\"]*";E[1]=" jndiName=";E[2]=" baseQueueName=";E[n=3]=" name=";for(i=1;i<=n;i++){O[i]=length(E[i])+1;E[i]=E[i]d}}/<factories.*baseQueueName/{o="";for(i=1;i<=n;i++){if(match($0,E[i]))o=o substr($0,RSTART+O[i],RLENGTH-O[i]);o=o ((i<n)?",":"")}print o}' resources.xml
real    0m0.007s
user    0m0.005s
sys     0m0.002s
_________
real    0m0.028s
user    0m0.025s
sys     0m0.002s


perl
time perl -nle '/<factories.*baseQueueName/ and @r=/(?:jndiN|baseQueueN|n)ame="([^"]+)/g and print join ",",@r[1,2,0]' resources.xml
real    0m0.008s
user    0m0.006s
sys     0m0.002s

time perl -nle '@r=/<factories.*name="([^"]+)"\sjndiName="([^"]+)".*baseQueueName="([^"]+)/ and print join ",",@r[1,2,0]' resources.xml
real    0m0.007s
user    0m0.005s
sys     0m0.002s

# 12  
Old 06-05-2016
One might guess that perl is used frequently on your system and awk is used infrequently. If that is the case, perl will always be in your cache (and will get consistent timings) while after a few minutes of inactivity awk will drop out of your cache and the first run after after it has dropped out of the cache will have to be reloaded from disk (needing more time to be loaded) and subsequent runs (while it is in the cache) run on par with perl.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract values from xml file script

Hi, please help on this. I want extract values of xml file structure and print in determined way. <ProjectName> --> only appears once <StructList> --> is the top node <Struct> node --> could be more than 1 NameID, STX, STY, PRX, PRY --> appears only 1 time within each <Struct> node... (10 Replies)
Discussion started by: Ophiuchus
10 Replies

2. UNIX for Dummies Questions & Answers

Urgent - XML Attribute Remove

Hi I have got a XML file which has got content as follows: <FUNCall81110000 Tag="81110000" CallDate="25/08/11" CallTime="00:03:22" TotalUsageValue="30" MeasurementUnit="1"/> I want to remove TotalUsageValue="30" only and TotalUsageValue="XXXXX" here XXX can be any value. (1 Reply)
Discussion started by: muchyog
1 Replies

3. Shell Programming and Scripting

Change attribute value in xml using shell script

hi, i am new to unix and i have a problem. -------------------------------------------------------------- sebben.xml <envelope> <email> sebben@example.com </email> </envelope> script_mail written in the vi editor. #!/bin/sh script to change the value in attribute <email> echo... (3 Replies)
Discussion started by: sebbenw
3 Replies

4. Shell Programming and Scripting

Extract values from an XML File

Hi, I need to capture all the attributes with delete next to it. The source XML file is attached. The output should contain something like this below: Attributes = legacyExchangeDN Action = Delete Username = Hero Joker Loginid = joker09 OU =... (4 Replies)
Discussion started by: prvnrk
4 Replies

5. Shell Programming and Scripting

Extracting the value of an middle attribute tag from XML

Hi All, Please help me out in resolving this.. <secondTag enabled='true' processName='test1' pidFile='/tmp/test1.pid' /> From the above tag, I'm trying to retrieve the value of enabled and pidFile attributes by means of processName attribute. Would be thankful in resolving this..... (5 Replies)
Discussion started by: mjavalkar
5 Replies

6. Shell Programming and Scripting

how to extract part of xml line via awk?

Hi, I like to set a variable "name" automatically by reading an xml file. My code looks like this: set name = `awk '/<generationTime>/,/<\/generationTime>/ p' $xml_name` the "name" is thus set to <generationTime>2004-12-01T08:23:50.000000</generationTime> How can I separate this line,... (3 Replies)
Discussion started by: friend
3 Replies

7. Shell Programming and Scripting

Changing attribute value in xml file using shell

I have an xml file.I want to change the value of some tag: <WASConfig version='1.1'> <JavaVirtualMachine> <scope> <server> <hostNode>myAsNode</hostNode> <name>myserver</name> </server> </scope> <Settings> <Setting> ... (5 Replies)
Discussion started by: javaholics
5 Replies

8. Shell Programming and Scripting

Extracting the value of an attribute tag from XML

Greetings, I am very new to the UNIX shell scripting and would like to learn. However, I am currently stuck on how to process the below sample of code from an XML file using UNIX comands: <ATTRIBUTE NAME="Memory" VALUE="512MB"/> <ATTRIBUTE NAME="CPU Speed" VALUE="3.0GHz"/> <ATTRIBUTE... (5 Replies)
Discussion started by: JesterMania
5 Replies

9. Shell Programming and Scripting

read xml tag attribute and store it in variable

Hi, How to read xml tag attributes and store into variable in shell script? Thanks, Swetha (5 Replies)
Discussion started by: swetha123
5 Replies

10. Shell Programming and Scripting

Extract XML Element Values

I have a rather large file with XML-style content. Each line contains one full XML entry. For example: 1:<Message><DNIS>1234</DNIS><UCID>3456</UCID><TransferGroup>XYZXYZ</TransferGroup></Message> 2:<Message><DNIS>9999</DNIS><UCID>2584</UCID><TransferGroup>ABCABC</TransferGroup></Message>... (1 Reply)
Discussion started by: sharpi03
1 Replies
Login or Register to Ask a Question