Grep content in xml file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Grep content in xml file
# 1  
Old 11-04-2013
Grep content in xml file

I have an xml file with header as below.

Code:
<Provider xmlns="http://www.xyzx.gov/xyz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xyzx.gov/xyz xyz.xsd" SCHEMA_VERSION="2.5" PROVIDER="5">

I want to get the schema version here that is 2.5 and put in a variable for further use in the shell script. The size of the data file is around 5 GB. can you please help how i can just grep and get the 2.5


Thanks much.

Last edited by Don Cragun; 11-04-2013 at 02:10 PM.. Reason: Change HTML tags to CODE tags.
# 2  
Old 11-04-2013
Use awk instead:
Code:
awk '/SCHEMA_VERSION/{gsub(/.*SCHEMA_VERSION="|".*/,X);print}' file.xml

# 3  
Old 11-04-2013
Code:
sed 's#.*SCHEMA_VERSION=\([^ ][^ ]*\).*#\1#' myFile

# 4  
Old 11-04-2013
If your 5MB xml file contains more than one line, I think vgersh99's script will produce more output than you want. If you are using a system where awk and sed have limited line lengths, both Yoda's script and vgersh99's script could fail if your xml file contains long lines.

The following awk script should get around these problems:
Code:
version=$(awk -F '=*"' '$1 == "SCHEMA_VERSION" { print $2; exit 0 }' RS=' ' file.xml)
printf "schema version is: %s\n" "$version"

If you want to run this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

With your sample input, the above script produces the output:
Code:
schema version is: 2.5

# 5  
Old 11-04-2013
You may try Grep

Code:
$ grep -Po '(?<=SCHEMA_VERSION=").*(?="[[:space:]])'  file.xml
2.5

Code:
my_variable=$(grep -Po '(?<=SCHEMA_VERSION=").*(?="[[:space:]])' file.xml)

# 6  
Old 11-05-2013
Quote:
Originally Posted by Don Cragun
If your 5MB xml file contains more than one line, I think vgersh99's script will produce more output than you want. If you are using a system where awk and sed have limited line lengths, both Yoda's script and vgersh99's script could fail if your xml file contains long lines.

The following awk script should get around these problems:
Code:
version=$(awk -F '=*"' '$1 == "SCHEMA_VERSION" { print $2; exit 0 }' RS=' ' file.xml)
printf "schema version is: %s\n" "$version"

If you want to run this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk.

With your sample input, the above script produces the output:
Code:
schema version is: 2.5

Hello Don,
Thanks for your input is there anyway to increase the performance, i am basically searching for three elements SCHEMA_VERSION, PROVIDER, UNINUM. Out of these schema version & provider are coming up fast using your line of code as they will exist only in initial lines at begining of XML file. However UNINUM exists at multiple places in 4.5 GB XML file and it is taking lot of time to retreive all of them, i modifed your code to below. Can you please help me further for faster retrieval of UNINUM's if you have any ideas. thank you.

awk -F '=*"' '$1 == "UNINUM" { print $2;}' RS=' ' file.xml

I did tried Yoda, Akshay, Vgersh99 inputs but those are not effective either.
# 7  
Old 11-05-2013
Hi,

May be this helps. Assuming like you have same pattern in all the lines in file.

Code:
$ a=`awk -F"\"" '/SCHEMA_VERSION/{print$8}' filename`
$ echo $a
2.5
$


Thanks,
R. Singh
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Grep some values from XML file

Dear community, I have a big XML log file containing several rows splitted by tag: <ActivityLogRecord> and </ActivityLogRecord>. An example below. What I need is read the file and extract some value from each tags and put them into one line (each line for every <ActivityLogRecord> tag). So... (5 Replies)
Discussion started by: Lord Spectre
5 Replies

2. UNIX for Dummies Questions & Answers

GREP for a tag in XML File

I have 2 XML Data files with a tag named PARTICIPATION_TYPE and i am trying to grep for that and getting unique values. However one of the xml data file data is not aligned properly like below. File 1: (works fine when i do grep) grep "PARTICIPATION_TYPE" file1.xml | sort -u Data: ....... (3 Replies)
Discussion started by: Ariean
3 Replies

3. Shell Programming and Scripting

Extracting content from xml file

Hello All, Hope you are doing well!!!!! I have a small code in the below format in xml file: <UML:ModelElement.taggedValue> <UML:TaggedValue tag="documentation" value="This sequence&#xA;&#xA;HLD_EA_0001X&#xA;HLD_DOORS_002X"/> <UML:TaggedValue tag="documentation" value="This... (11 Replies)
Discussion started by: suvendu4urs
11 Replies

4. Shell Programming and Scripting

Grep to display file name along with content in Solaris

Am using the following grep to match a particular patter in grep. grep xyz abc.txt now while i run this command, if the pattern matched, am getting the line containing xyz Output: xyz is doing some work Now if i want the file name also along with my output, what should i do Expected... (2 Replies)
Discussion started by: rituparna_gupta
2 Replies

5. Shell Programming and Scripting

Create xml file using a content from another xml file

I need to create a xml file(master.xml) with contents from another xml files(children). I have below list of xml files in a temporary location (C:/temp/xmls) 1. child1.xml 2. child2.xml Below is the content of the child1.xml & child2.xml files, child1.xml <root> <emp> ... (3 Replies)
Discussion started by: vel4ever
3 Replies

6. Shell Programming and Scripting

Grep/Parse a .xml file

I have a .xml file similar to the following: <Column> <Name>FIELD1</Name> <Title>CO.</Title> </Column> <Column> <Name>FIELD2</Name> <EditField>TextBox</EditField> <ColumnSpan0>4</ColumnSpan0> <Title>NORMAL</Title> ... (12 Replies)
Discussion started by: jl487
12 Replies

7. Shell Programming and Scripting

Need to replace particular content in a xml file

Hi, My requirement is to find a text and replace it with another in a XML file. I am new to Unix,Please provide some suggestion to achieve. Find: <Style ss:ID="ColumnHeader1"> Replace with: <Style ss:ID="ColumnHeader1"> <Borders> <Border ss:Position="Bottom"... (4 Replies)
Discussion started by: cnraja
4 Replies

8. Shell Programming and Scripting

Extract XML content from a file

310439 2012-01-11 03:44:42,291 INFO PutServlet:? - Content of the Message is:="1.0" encoding="UTF-8"?><ESP_SSIA_ACC_FEED> 310440 <BATCH_ID>12345678519</BATCH_ID> 310441 <UID>3498748823</UID> 310442 <FEED_TYPE>FULL</FEED_TYPE> 310443 <MART_NAME>SSIA_DM_TRANSACTIONS</MART_NAME> 310444... (11 Replies)
Discussion started by: arukuku
11 Replies

9. Shell Programming and Scripting

Read content between xml tags with awk, grep, awk or what ever...

Hello, I trying to extract text that is surrounded by xml-tags. I tried this cat tst.xml | egrep "<SERVER>.*</SERVER>" |sed -e "s/<SERVER>\(.*\)<\/SERVER>/\1/"|tr "|" " " which works perfect, if the start-tag and the end-tag are in the same line, e.g.: <tag1>Hello Linux-Users</tag1> ... (5 Replies)
Discussion started by: Sebi0815
5 Replies

10. Shell Programming and Scripting

appending content in a xml file

Please help me on this..... i have a file which has following content: <IPCoreProducerConfig> <Producer> <config> <key>machineId</key> <value>machine1</value> </config> <config> ... (4 Replies)
Discussion started by: Aditya.Gurgaon
4 Replies
Login or Register to Ask a Question