Extract a pattern from xml file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract a pattern from xml file
# 1  
Old 08-20-2012
Extract a pattern from xml file

Hi,

In a single line I have the below xml content

Code:
<lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst><lst name="status"><lst name=""><str name="name"/><str name="instanceDir">/var/www/search/current/Search/solr/./</str><str name="dataDir">/data/www/search/shared/indexes/</str><date name="startTime">2012-08-09T13:48:35.584Z</date><long name="uptime">925837154</long><lst name="index"><int name="numDocs">205235</int><int name="maxDoc">205235</int><long name="version">1326998109779</long><bool name="optimized">true</bool><bool name="current">true</bool><bool name="hasDeletions">false</bool><str name="directory">org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/var/www/search/shared/indexes/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@a26ff14</str><date name="lastModified">2012-08-20T06:56:43Z</date></lst></lst></lst>

I wanted to extract what ever value is in this <int name="numDocs">205235</int>

Please explain if you post with awk or sed - As it would be very much helpful for beginners to learn and understand.

Thanks
# 2  
Old 08-20-2012
Code:
sed -n '/.*<int name="numDocs">\([^<]*\)<.*/s//\1/p' file

With awk:
Code:
awk 'sub(/.*<int name="numDocs">/,""){print $0+0}' file

This User Gave Thanks to elixir_sinari For This Post:
# 3  
Old 08-20-2012
Thank you very much.

Could you please explain the parts of each command.

Best
Ashok
# 4  
Old 08-20-2012
sed

.* --> Any character any number of times
<int name="numDocs">--> the required pattern, of course
]\([^<]*\)< --> a tagged regular expression (TRE) to store all the characters (except for <) upto the first left chevron (<)
.* --> the remaining characters in the line.
In a line matching this pattern, substitute the whole line (// is the remembered previous pattern) with the TRE (\1) and print(p).

---

awk

sub(/.*<int name="numDocs">/,"")--> in each line read, try to delete the the pattern upto <int name="numDocs">.
If this substitution is successful, sub() returns 1 and the corresponding action is executed.
The action adds 0 to the whole remaining record/line. This retains only the first number in the line and prints it.
This User Gave Thanks to elixir_sinari For This Post:
# 5  
Old 08-20-2012
Good explanation.

On awk side in the last if there were characters like "PASS" then

Code:
<str name="status">PASS</str>

awk 'sub(/.*<str name="status">/,""){print $0}'

Result: PASS</str>

Could you please advise.
# 6  
Old 08-20-2012
I had assumed a number in the field. Nevertheless, try:
Code:
awk 'match($0,/<str name="status">[^<]+</){print substr($0,RSTART+19,RLENGTH-20)}' file

This uses the match() function to match the pattern in the input line. If no match found, match() will return 0 and no further processing will be done on the line. If multiple matches are possible, match() will only match the first match (too many matches Smilie) and set the special variables RSTART and RLENGTH.
RSTART --> starting position in the line where the match was found.
RLENGTH --> length of the match made.
Using values of these 2 variables, we print the required substring.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract a value from an xml file

I have this XML file format and all in one line: Fri Dec 23 00:14:52 2016 Logged Message:689|<?xml version="1.0" encoding="UTF-8"?><PORT_RESPONSE><HEADER><ORIGINATOR>XMG</ORIGINATOR><DESTINAT... (16 Replies)
Discussion started by: mrn6430
16 Replies

2. Shell Programming and Scripting

Extract a particular xml only from an xml jar file

Hi..need help on how to extract a particular xml file only from an xml jar file... thanks! (2 Replies)
Discussion started by: qwerty000
2 Replies

3. Shell Programming and Scripting

Get extract text from xml file

Hi Collegue, i have a file say a.xml. it has contents <bpelFault><faultType>1</faultType><genericSystemFault xmlns=""><part name="payload"><v2:Fault... (10 Replies)
Discussion started by: Jewel
10 Replies

4. Shell Programming and Scripting

Extract data from XML file

Hi , I have input file as XML. following are input data #complex.xml <?xml version="1.0" encoding="UTF-8"?> <TEST_doc xmlns="http://www.w3.org/2001/XMLSchema-instance"> <ENTRY uid="123456"> <protein> <name>PROT001</name> <organism>Human</organism> ... (1 Reply)
Discussion started by: mohan sharma
1 Replies

5. Shell Programming and Scripting

Extract XML tag value from file

Hello, Hope you are doing fine. I have an log file which looks like as follows: Some junk text1 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: Some jun text 2 Date: Thu Mar 15 13:38:46 CDT 2012 DATA SENT SUCCESSFULL: ... (3 Replies)
Discussion started by: srattani
3 Replies

6. Shell Programming and Scripting

extract a pattern from a xml file

Hello All, I want to write a shell script for extracting a content from a xml file the xml file looks like this: <Variable name="moreAxleInfo"> <type> <Table> <type> <NamedType> <type> <TypeRef... (11 Replies)
Discussion started by: suvendu4urs
11 Replies

7. Shell Programming and Scripting

Extract XML content from a file

310439 2012-01-11 03:44:42,291 INFO PutServlet:? - Content of the Message is:="1.0" encoding="UTF-8"?><ESP_SSIA_ACC_FEED> 310440 <BATCH_ID>12345678519</BATCH_ID> 310441 <UID>3498748823</UID> 310442 <FEED_TYPE>FULL</FEED_TYPE> 310443 <MART_NAME>SSIA_DM_TRANSACTIONS</MART_NAME> 310444... (11 Replies)
Discussion started by: arukuku
11 Replies

8. UNIX for Dummies Questions & Answers

Extract Field Value from XML file

Hi, Within a UNIX shell script I need to extract a value from an XML field. The field will contain different values but will always be 6 digits in length. E.g.: <provider-id>999999</provider-id> I've tried various ways but no luck. Any ideas how I might get the provider id (in this case... (2 Replies)
Discussion started by: pnclayt11
2 Replies

9. Shell Programming and Scripting

extract a number within an xml file

Hi Everyone, I have an sh script that I am working on and I have run into a little snag that I am hoping someone here can assist me with. I am using wget to retrieve an xml file from thetvdb.com. This part works ok but what I need to be able to do is extract the series ID # from the xml and put... (10 Replies)
Discussion started by: tret
10 Replies

10. Shell Programming and Scripting

How to extract text from xml file

I have some xml files that got created by exporting a website from RedDot. I would like to extract the cost, course number, description, and meeting information. <?xml version="1.0" encoding="UTF-16" standalone="yes" ?> - <PAG PAG0="3AE6FCFD86D34896A82FCA3B7B76FF90" PAG3="525312"... (3 Replies)
Discussion started by: chrisf
3 Replies
Login or Register to Ask a Question