How to extract xml attribute values using awk inline.?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to extract xml attribute values using awk inline.?
# 1  
Old 06-04-2016
How to extract xml attribute values using awk inline.?

I am trying to extract specific XML attribute values for search pattern <factories.*baseQueueName' from resources.xml.
my scripts works ok,, but to extract 3 values this code does echo $line three times, it could be 'n' times. How can I use awk to extract matching pattern values in-line or efficiently than I am doing.

Code:
resources.xml
    <factories xmi:type="resources.jms.mqseries:MQQueue" xmi:id="MQQueue_11111" name="Queue1" jndiName="jms/Queue1" description="Queue1" category="TEST" persistence="APPLICATION_DEFINED" priority="APPLICATION_DEFINED" specifiedPriority="0" expiry="APPLICATION_DEFINED" specifie dExpiry="0" baseQueueName="TEST.QUEUE1" baseQueueManagerName="" useNativeEncoding="false" integerEncoding="Normal" decimalEncoding="Normal" floatingPointEncoding="IEEENormal" targetClient="JMS" queueManagerHost="" queueManagerPort="0" serverConnectionChannelName="" userName="" password="{xor}" readAhead="NO"/>
	
    <factories xmi:type="resources.jms.mqseries:MQQueue" xmi:id="MQQueue_22222" name="Queue2" jndiName="jms/Queue2" description ="Queue2" category="TEST" persistence="APPLICATION_DEFINED" priority="APPLICATION_DEFINED" specifiedPriority="0" expiry="APPLICATION_DEFINED" specifiedExpiry="0" baseQueueName="TEST.QUEUE2" baseQueueManagerName="" useNativeEncoding="false" integerEncoding="Normal" decimalEncoding="Normal" floa tingPointEncoding="IEEENormal" targetClient="JMS" queueManagerHost="" queueManagerPort="0" serverConnectionChannelName="" userName="" password="{xor}" readAhead="NO"/>

# .. around 20+ similar lines like above.

Code:
	
grep '<factories.*baseQueueName' resources.xml | while read line; do
	QUEUE_JNDI_NAME=$( echo $line | grep -Po 'jndiName=\D\S+\D'      | cut -d'"' -f2 )
	BASE_QUEUE_NAME=$( echo $line | grep -Po 'baseQueueName=\D\S+\D' | cut -d'"' -f2 )
	QUEUE_NAME=$(      echo $line | grep -Po 'name=\D\S+\D'          | cut -d'"' -f2 )
	echo "$QUEUE_JNDI_NAME,$BASE_QUEUE_NAME,$QUEUE_NAME"
	done

output:
Code:
jms/Queue1,TEST.QUEUE1,Queue1
jms/Queue2,TEST.QUEUE2,Queue2
# .. around 20+ similar output lines like above.


Last edited by kchinnam; 06-04-2016 at 09:27 PM.. Reason: formatting
# 2  
Old 06-04-2016
Maybe you want something like:
Code:
awk '
BEGIN {	dqsERE = "\"[^\"]*\""
	EREs[++nEREs] = " jndiName=" dqsERE
	EREs[++nEREs] = " baseQueueName=" dqsERE
	EREs[++nEREs] = " name=" dqsERE
	for(i = 1; i <= nEREs; i++)
		offset[i] = index(EREs[i], "=") + 1
}
/<factories.*baseQueueName/ {
	out = ""
	for(i = 1; i <= nEREs; i++) {
		if(match($0, EREs[i]))
			out = out substr($0, RSTART + offset[i],
				RLENGTH - offset[i] - 1)
		out = out ((i < nEREs) ? "," : "")
	}
	print out
}' resources.xml

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 06-04-2016
Would this do it?
Code:
perl -nle '@r=/(?:jndiN|baseQueueN|n)ame="([^"]+)/g and print join ",",@r[1,2,0]' resources.xml

Output:
Code:
jms/Queue1,TEST.QUEUE1,Queue1
jms/Queue2,TEST.QUEUE2,Queue2

# 4  
Old 06-05-2016
Aia, thanks for trying,, I knew a wicked one liner like you gave could do this..
Don, please check performance stats,, I am bash/awk lover,, we have work to do..

Code:
 
grep '<factories.*baseQueueName' resources.xml | perl -nle '@r=/(?:jndiN|baseQueueN|n)ame="([^"]+)/g and print join ",",@r[1,2,0]'

Code:
# 153 XML tag lines scanned from one resources.xml file. time <command> gives these stats
real    0m0.007s, user    0m0.004s, sys     0m0.002s  -- Perl solution OMG
real    0m0.028s, user    0m0.025s, sys     0m0.002s  -- Don's awk solution
real    0m0.928s, user    0m0.458s, sys     0m0.409s  -- My general public :-) solution. Why sys taking so long here! not fair.

is there a way to combine first regex '<factories.*baseQueueName' into second one as well?

I wonder if any basic shell commands like sed/awk/grep can match what you are able to do with perl.. I would love to see simplified awk solution to beat perl.

Last edited by kchinnam; 06-05-2016 at 01:10 AM.. Reason: more details
# 5  
Old 06-05-2016
I asked you if it would work, because <factories.*baseQueueName appeared to me, quite a long regex to verify a line. However, if you really need it, grep is not necessary.
Code:
perl -nle '/<factories.*baseQueueName/ and @r=/(?:jndiN|baseQueueN|n)ame="([^"]+)/g and print join ",",@r[1,2,0]' resources.xml

or, if the order of the strings are always the same:
Code:
perl -nle '@r=/<factories.*name="([^"]+)"\sjndiName="([^"]+)".*baseQueueName="([^"]+)/ and print join ",",@r[1,2,0]' resources.xml


Last edited by Aia; 06-05-2016 at 01:48 AM.. Reason: Add one regex
This User Gave Thanks to Aia For This Post:
# 6  
Old 06-05-2016
performance is same with in-line first part of regex in perl vs grep + perl.

Can you explain what is the meaning of below two expressions in your command?
Code:
'?:' 
"([^"]+)

# 7  
Old 06-05-2016
Quote:
Originally Posted by kchinnam
[...]
Can you explain what is the meaning of below two expressions in your command?
Code:
'?:' 
"([^"]+)

(?:) does not create a captured group. Anything inside () would be saved into a group; we do not want that sometime, (like in that occasion).
"([^"]+) match a " and keeps matching, as a captured group, anything until it meets another ". That last " is not included in the group.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract values from xml file script

Hi, please help on this. I want extract values of xml file structure and print in determined way. <ProjectName> --> only appears once <StructList> --> is the top node <Struct> node --> could be more than 1 NameID, STX, STY, PRX, PRY --> appears only 1 time within each <Struct> node... (10 Replies)
Discussion started by: Ophiuchus
10 Replies

2. UNIX for Dummies Questions & Answers

Urgent - XML Attribute Remove

Hi I have got a XML file which has got content as follows: <FUNCall81110000 Tag="81110000" CallDate="25/08/11" CallTime="00:03:22" TotalUsageValue="30" MeasurementUnit="1"/> I want to remove TotalUsageValue="30" only and TotalUsageValue="XXXXX" here XXX can be any value. (1 Reply)
Discussion started by: muchyog
1 Replies

3. Shell Programming and Scripting

Change attribute value in xml using shell script

hi, i am new to unix and i have a problem. -------------------------------------------------------------- sebben.xml <envelope> <email> sebben@example.com </email> </envelope> script_mail written in the vi editor. #!/bin/sh script to change the value in attribute <email> echo... (3 Replies)
Discussion started by: sebbenw
3 Replies

4. Shell Programming and Scripting

Extract values from an XML File

Hi, I need to capture all the attributes with delete next to it. The source XML file is attached. The output should contain something like this below: Attributes = legacyExchangeDN Action = Delete Username = Hero Joker Loginid = joker09 OU =... (4 Replies)
Discussion started by: prvnrk
4 Replies

5. Shell Programming and Scripting

Extracting the value of an middle attribute tag from XML

Hi All, Please help me out in resolving this.. <secondTag enabled='true' processName='test1' pidFile='/tmp/test1.pid' /> From the above tag, I'm trying to retrieve the value of enabled and pidFile attributes by means of processName attribute. Would be thankful in resolving this..... (5 Replies)
Discussion started by: mjavalkar
5 Replies

6. Shell Programming and Scripting

how to extract part of xml line via awk?

Hi, I like to set a variable "name" automatically by reading an xml file. My code looks like this: set name = `awk '/<generationTime>/,/<\/generationTime>/ p' $xml_name` the "name" is thus set to <generationTime>2004-12-01T08:23:50.000000</generationTime> How can I separate this line,... (3 Replies)
Discussion started by: friend
3 Replies

7. Shell Programming and Scripting

Changing attribute value in xml file using shell

I have an xml file.I want to change the value of some tag: <WASConfig version='1.1'> <JavaVirtualMachine> <scope> <server> <hostNode>myAsNode</hostNode> <name>myserver</name> </server> </scope> <Settings> <Setting> ... (5 Replies)
Discussion started by: javaholics
5 Replies

8. Shell Programming and Scripting

Extracting the value of an attribute tag from XML

Greetings, I am very new to the UNIX shell scripting and would like to learn. However, I am currently stuck on how to process the below sample of code from an XML file using UNIX comands: <ATTRIBUTE NAME="Memory" VALUE="512MB"/> <ATTRIBUTE NAME="CPU Speed" VALUE="3.0GHz"/> <ATTRIBUTE... (5 Replies)
Discussion started by: JesterMania
5 Replies

9. Shell Programming and Scripting

read xml tag attribute and store it in variable

Hi, How to read xml tag attributes and store into variable in shell script? Thanks, Swetha (5 Replies)
Discussion started by: swetha123
5 Replies

10. Shell Programming and Scripting

Extract XML Element Values

I have a rather large file with XML-style content. Each line contains one full XML entry. For example: 1:<Message><DNIS>1234</DNIS><UCID>3456</UCID><TransferGroup>XYZXYZ</TransferGroup></Message> 2:<Message><DNIS>9999</DNIS><UCID>2584</UCID><TransferGroup>ABCABC</TransferGroup></Message>... (1 Reply)
Discussion started by: sharpi03
1 Replies
Login or Register to Ask a Question