Problems with grep and XML


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Problems with grep and XML
# 1  
Old 10-12-2006
Problems with grep and XML

I'm trying to use grep on XML files. The same grep expressions work on plaint text files but not on XML files (which of course are plain text files). Actually, these expressions work on XML files saved in DreamWeaver, but not when the same files are saved in XML Spy.

I want grep to treat these files as plain text which is what I assumed it would do.

Any pointers?

Thanks,

Paul Smilie
# 2  
Old 10-12-2006
I don't know about Spy's output. First thing: can you vi those files and read them?
If you are not getting text behavior in vi, then spy may be doing something interesting.

Or try:
diff spy.xml dm.xml

Where these two (spy.xml and dm.xml) should be identical. They probably are not.
# 3  
Old 10-12-2006
My wild guess would be that XML Spy is saving them in something crazy like 16-bit Unicode.
# 4  
Old 10-13-2006
Thanks for your input.

Corona688, you said "My wild guess would be that XML Spy is saving them in something crazy like 16-bit Unicode" and I think that's it. (Don't know how to test it though?

After further experimentation, I've discovered that this only happens when I save it encoded as UTF-16 (i.e. the declaration is <?xml version="1.0" encoding="UTF-16"?>).

When Encoding = UTF-16
Saved in DreamWeaver - file is accessible to grep
Saved in XmlSpy - not accessible to grep

When Encoding = UTF-8
Saved in DreamWeaver - accessible to grep
Saved in XmlSpy - accessible to grep

So if XmlSpy is saving them in 16-bit unicode, can I use grep with them, or do I need to turn to something else?

Cheers
# 5  
Old 10-13-2006
If you're saving it as UTF-16, you're saving it as 16-bit Unicode, and my guess was right on the money. You can't grep UTF-16. It uses 16-bit characters instead of the normal 8 bits. Grep is comparing half a character of UTF-16 to one full character of some other set. Naturally it won't work.

Can you save it as UTF-8 instead? It can represent all UNICODE characters, without breaking grep.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to fetch the value from a xml using sed, GREP?

I have a simple xml file,need the output with the <value> tag and <result> tag text.xml <test-method status="FAIL" duration="45"> <value> Id=C18 </value> <result> wrong paramter </result> </test-method> <test-method status="FAIL" duration="45"> <value> Id=C19 </value> <result> Data... (5 Replies)
Discussion started by: DevAakash
5 Replies

2. Shell Programming and Scripting

How to grep for a word in xml?

Hi, I have the below tag/s in my xml. <foreign-server name="MOHTASHIM_SERVER"> What will be the easist way to extract MOHTASHIM_SERVER without the double quotes "" from the above tag? Desired Output: (10 Replies)
Discussion started by: mohtashims
10 Replies

3. Shell Programming and Scripting

Grep some values from XML file

Dear community, I have a big XML log file containing several rows splitted by tag: <ActivityLogRecord> and </ActivityLogRecord>. An example below. What I need is read the file and extract some value from each tags and put them into one line (each line for every <ActivityLogRecord> tag). So... (5 Replies)
Discussion started by: Lord Spectre
5 Replies

4. UNIX for Dummies Questions & Answers

Grep content in xml file

I have an xml file with header as below. <Provider xmlns="http://www.xyzx.gov/xyz" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xyzx.gov/xyz xyz.xsd" SCHEMA_VERSION="2.5" PROVIDER="5"> I want to get the schema version here that is 2.5 and put in a... (7 Replies)
Discussion started by: Ariean
7 Replies

5. UNIX for Dummies Questions & Answers

GREP for a tag in XML File

I have 2 XML Data files with a tag named PARTICIPATION_TYPE and i am trying to grep for that and getting unique values. However one of the xml data file data is not aligned properly like below. File 1: (works fine when i do grep) grep "PARTICIPATION_TYPE" file1.xml | sort -u Data: ....... (3 Replies)
Discussion started by: Ariean
3 Replies

6. Shell Programming and Scripting

Grep/Parse a .xml file

I have a .xml file similar to the following: <Column> <Name>FIELD1</Name> <Title>CO.</Title> </Column> <Column> <Name>FIELD2</Name> <EditField>TextBox</EditField> <ColumnSpan0>4</ColumnSpan0> <Title>NORMAL</Title> ... (12 Replies)
Discussion started by: jl487
12 Replies

7. Shell Programming and Scripting

Problems in Usage of grep

Hi all, I have a file resp_cde.ats which has values as:- APPDIR=C:\Program Files\Cogny\cert PUBSDIR=C:\Program Files\Cognoy\cert\documentation TOURDIR=C:\Program Files\Cognoy\cert\tour DATADIR=C:\Program Files\Cognoy\cert\data Now I use the grep command in a shell script:- x=`grep... (2 Replies)
Discussion started by: vikasrout
2 Replies

8. Shell Programming and Scripting

Grep XML tags

I want to search the below XML pattern in the XML files, but the XML files would be in a .GZ files, <PRODID>LCTO84876</PRODID> <PARTNUMBER>8872AC1</PARTNUMBER> <WWPRODID>MODEL84876</WWPRODID> <COUNTRY>US</COUNTRY> <LANGUAGE>1</LANGUAGE> What's the command/script to search it ? :confused: (2 Replies)
Discussion started by: saravvij
2 Replies

9. Shell Programming and Scripting

Grep xml tags

Hi I want to get the value between to XML tags as follows <EAN>12345</EAN> so i would want to return 12345. i have tried sed and awk but can't do it. can anyone help? (9 Replies)
Discussion started by: handak9
9 Replies

10. UNIX for Dummies Questions & Answers

problems with grep on solaris 5.8

Hi all, I have a problem when i grep for a particular field among all fhe files in the directory. if i do an ls -l field * i can find it. however at the moment the number of files in the directory are close to 28000 and it returns an ksh: /usr/bin/grep: arg list too long Assuming i... (2 Replies)
Discussion started by: manualvin
2 Replies
Login or Register to Ask a Question