An invalid XML character (Unicode: 0x1a)


 
Thread Tools Search this Thread
Operating Systems Solaris An invalid XML character (Unicode: 0x1a)
# 1  
Old 05-13-2011
An invalid XML character (Unicode: 0x1a)

While uploading an exl file to my application in Solaris 10 the upload failed with error
HTML Code:
Error! Parsing Error: /SPLM/TC83/tcdata83/model/model_dbextract.xml  Line:65576 Column:73 An invalid XML character (Unicode: 0x1a) was found  in the value of attribute "unitOfMeasureSymbol" and element is  "TcUnitOfMeasure".
Please check the errors.
Aborting...
Exception Encountered!!!
java.lang.NullPointerException
what i found is xml file when i open in windows the failed line shows something like this
HTML Code:
<TcUnitOfMeasure unitOfMeasureName="Microampere" unitOfMeasureSymbol="µA"/>
        <TcUnitOfMeasure unitOfMeasureName="Microfarad" unitOfMeasureSymbol="µF"/>
same line after transfering to unix using ascii option in ftp looks like
HTML Code:
 <TcUnitOfMeasure unitOfMeasureName="Microampere" unitOfMeasureSymbol="\265A"/>
                <TcUnitOfMeasure unitOfMeasureName="Microfarad" unitOfMeasureSymbol="\265F"/>
if i use ftp transfer option as binary looks like
HTML Code:
<TcUnitOfMeasure unitOfMeasureName="Microampere" unitOfMeasureSymbol="^ZA"/>
                <TcUnitOfMeasure unitOfMeasureName="Microfarad" unitOfMeasureSymbol="^ZF"/>
hence the symbol for micofard mu is not parsing in unix, can experts help me how i can solve this issue

Thank you
Raghu
# 2  
Old 05-14-2011
Your file seems to be encoded in ISO-8859-1 by windows while UTF-8 is likely expected.

Is an encoding specified in its header ?
Something like:
Code:
<?xml version="1.0" encoding="utf-8" ?>

?

In any case, this should work:
Code:
 unitOfMeasureSymbol="&#x03BC;A"

# 3  
Old 05-14-2011
Header is
HTML Code:
?xml version="1.0" encoding="UTF-8" standalone="no"?>
i checked for my locale settings in server
HTML Code:
-> locale
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
HTML Code:
and locale - lists this output
C
POSIX
hi_IN.UTF-8
iso_8859_1
ja
ja_JP.PCK
ja_JP.UTF-8
ja_JP.eucJP
ko
ko.UTF-8
ko_KR.EUC
ko_KR.EUC@dict
ko_KR.UTF-8
ko_KR.UTF-8@dict
th
th_TH
th_TH.ISO8859-11
th_TH.TIS620
th_TH.UTF-8
zh
zh.GBK
zh.UTF-8
zh_CN.EUC
zh_CN.EUC@pinyin
zh_CN.EUC@radical
zh_CN.EUC@stroke
zh_CN.GB18030
zh_CN.GB18030@pinyin
zh_CN.GB18030@radical
zh_CN.GB18030@stroke
zh_CN.GBK
zh_CN.GBK@pinyin
zh_CN.GBK@radical
zh_CN.GBK@stroke
zh_CN.UTF-8
zh_CN.UTF-8@pinyin
zh_CN.UTF-8@radical
zh_CN.UTF-8@stroke
zh_HK.BIG5HK
zh_HK.BIG5HK@radical
zh_HK.BIG5HK@stroke
zh_HK.UTF-8
zh_TW
zh_TW.BIG5
zh_TW.BIG5@pinyin
zh_TW.BIG5@radical
zh_TW.BIG5@stroke
zh_TW.BIG5@zhuyin
zh_TW.EUC
zh_TW.EUC@pinyin
zh_TW.EUC@radical
zh_TW.EUC@stroke
zh_TW.EUC@zhuyin
zh_TW.UTF-8

so that means i don't have corret UTF-8 locale?

i used "tcunitOfMeasureSymbol="&#x03BC;A" still no sucess
# 4  
Old 05-14-2011
You may have something else going on. What parser are you using? Can your parser handle the following short XML document?
Code:
<?xml version="1.0" encoding="utf-8" ?>
<Собирание версия="2.5-7">
 <Объект id="14">
  <НомерОбъекта>45-3454-123</НомерОбъекта>
  <ВНаличии>1512</ВНаличии>
  <Описание xml:lang="ja">第二発電機</Описание>
 </Объект>
 <Объект id="64">
  <НомерОбъекта>45-7894-456</НомерОбъекта>
  <ВНаличии>1435</ВНаличии>
  <Описание xml:lang="ja">手動ウォーター・ポンプ</Описание>
 </Объект>
</Собирание>

# 5  
Old 05-14-2011
Quote:
Originally Posted by karghum
Header is
HTML Code:
?xml version="1.0" encoding="UTF-8" standalone="no"?>
Okay. Then that is the problem. Your µ is not in UTF-8 in this file.
Quote:
so that means i don't have corret UTF-8 locale?
It seems you are using 7 bit ASCII locale. What says
Code:
set|grep LC

?
Quote:
i used "tcunitOfMeasureSymbol="&#x03BC;A" still no sucess
This is odd. What error message do you get ?
# 6  
Old 05-14-2011
Here is what it return

-> set|grep LC
MAILCHECK=600

about the error, it's same error and at the same line it start "microfarad"

HTML Code:
Localization Extraction Completed.
Please refer [/SPLM/TC83/server_root/logs/business_model_extractor_2011_05_14_08-09-49.log] for log information
An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is "TcUnitOfMeasure".
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is "TcUnitOfMeasure".
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parseWithValiation(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parse(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parse(Unknown Source)
at com.teamcenter.bmide.foundation.core.loader.BusinessDataContentParser.parse(Unknown Source)
at com.teamcenter.bmide.foundation.core.util.ServerCoreUtil.buildModels(Unknown Source)
at com.teamcenter.bmide.foundation.core.util.ServerCoreUtil.buildModels(Unknown Source)
at com.teamcenter.bmide.tcplmxml.xsdgen.impl.TcPlmXmlXsdInstallToTC.install(Unknown Source)
at com.teamcenter.bmide.tcplmxml.xsdgen.impl.TcPlmXmlXsdInstallToTCMain.main(Unknown Source)
Aborting...
HTML Code:
fpmurphy
My server coludn't hadle the test xml file you gave, transfered it in ascii/binary format in ftp and checked

HTML Code:
-> cat test.xml
<?xml version="1.0" encoding="utf-8" ?>
<????????? ??????="2.5-7">
 <?????? id="14">
  <????????????>45-3454-123</????????????>
  <????????>1512</????????>
  <???????? xml:lang="ja">?????</????????>
 </??????>
 <?????? id="64">
  <????????????>45-7894-456</????????????>
  <????????>1435</????????>
  <???????? xml:lang="ja">???????·???</????????>
 </??????>
</?????????>infodba-ie10ux013:/home/infodba
---------- Post updated at 04:13 AM ---------- Previous update was at 04:12 AM ----------

Here is what it return

-> set|grep LC
MAILCHECK=600

about the error, it's same error and at the same line it start "microfarad"

HTML Code:
Localization Extraction Completed.
Please refer [/SPLM/TC83/server_root/logs/business_model_extractor_2011_05_14_08-09-49.log] for log information
An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is "TcUnitOfMeasure".
org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x1a) was found in the value of attribute "unitOfMeasureSymbol" and element is "TcUnitOfMeasure".
at org.apache.xerces.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at org.apache.xerces.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLErrorReporter.reportError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.reportFatalError(Unknown Source)
at org.apache.xerces.impl.XMLScanner.scanAttributeValue(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parseWithValiation(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parse(Unknown Source)
at com.teamcenter.bmide.base.core.loader.XMLContentParser.parse(Unknown Source)
at com.teamcenter.bmide.foundation.core.loader.BusinessDataContentParser.parse(Unknown Source)
at com.teamcenter.bmide.foundation.core.util.ServerCoreUtil.buildModels(Unknown Source)
at com.teamcenter.bmide.foundation.core.util.ServerCoreUtil.buildModels(Unknown Source)
at com.teamcenter.bmide.tcplmxml.xsdgen.impl.TcPlmXmlXsdInstallToTC.install(Unknown Source)
at com.teamcenter.bmide.tcplmxml.xsdgen.impl.TcPlmXmlXsdInstallToTCMain.main(Unknown Source)
Aborting...
HTML Code:
Hello fpmurphy
My server coludn't handle the test xml file you gave, transfered it in ascii/binary format in ftp and checked

HTML Code:
-> cat test.xml
<?xml version="1.0" encoding="utf-8" ?>
<????????? ??????="2.5-7">
 <?????? id="14">
  <????????????>45-3454-123</????????????>
  <????????>1512</????????>
  <???????? xml:lang="ja">?????</????????>
 </??????>
 <?????? id="64">
  <????????????>45-7894-456</????????????>
  <????????>1435</????????>
  <???????? xml:lang="ja">???????·???</????????>
 </??????>
</?????????>infodba-ie10ux013:/home/infodba
# 7  
Old 05-14-2011
Quote:
Originally Posted by karghum
Here is what it return

-> set|grep LC
MAILCHECK=600
You have no locale set. What says:
Code:
cat /etc/default/init | grep -v "^#"

?
Quote:
about the error, it's same error and at the same line it start "microfarad"
It looks like you didn't replace all occurences of "µ" by "&#x03BC;".
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find invalid character

HI Team, I have script to find the invalid character in file. f=’pallvi\mahajan’ n=0 while (( $n <= ${#f} )); do c="${f:$n:1}" echo '$c' if *] ]]; then grep -sq $c valid.txt if ; then echo "$f" >> f.txt break fi fi (18 Replies)
Discussion started by: pallvi_mahajan
18 Replies

2. Shell Programming and Scripting

Searching invalid character in list of client name

Hi Friend, I have a client name list and client name has some invalid character due to which some issue raised and list of client are15k. I want to make script who find invalid character name. can you please help me how i can make script, i means i need logic. Valid character are :- ... (5 Replies)
Discussion started by: pallvi_mahajan
5 Replies

3. Shell Programming and Scripting

Greping entire XML which has special character

I have an XML with has special character Â. I wrote a Grep command to find out the special character grep -i  Filename | grep ShipAddress2 I need the help to know how to find out special character such as  and get the whole XML listed assuming there are more xml data of similar sort for... (3 Replies)
Discussion started by: murali1687
3 Replies

4. AIX

Bison -pap_expr_yy invalid character:% unexpected "identifier" while running make for Apache2.4.3 64

The Follwing packages are installed on my AIX 6.1 box gcc-4.7.2-1 gcc-c++-4.7.2-1 gcc-cpp-4.7.2-1 gcc-gfortran-4.7.2-1 libgcc-4.7.2-1 libgomp-4.7.2-1 libstdc++-4.7.2-1 libstdc++-devel-4.7.2-1 gmp-5.0.5-1 libmpc-1.0.1-2 libmpc-devel-1.0.1-2 libmpcdec-1.2.6-1 libmpcdec-devel-1.2.6-1... (0 Replies)
Discussion started by: Ashish Gupta
0 Replies

5. Shell Programming and Scripting

Unicode help

is there any way to handle unicode such as ʃʰɐm̆ (1 Reply)
Discussion started by: sreejithalokkan
1 Replies

6. UNIX for Dummies Questions & Answers

Remove Unicode/special chars from XML

Hi, We are receiving an XML file in Unix which has some special characters between tags like '^' etc <Tag> 1e^O7f%<2304e.$d8f57e8^Bf-&e.^Zh7/327e^O7 </Tag> We need to remove all special characters like ^ ones and also any '&' or '<' or '>' being sent within the start and close tags i.e.... (6 Replies)
Discussion started by: dsrookie7
6 Replies

7. Shell Programming and Scripting

How do I replace a unicode character using sed

I have a unicode character {Unicode: 0x1C} in my file and I need to replace it with a blank. How would a sed command look like? cat file1 | sed "s/&#x28;//g;" > file2 Is X28 the right value for this Unicode character?? (4 Replies)
Discussion started by: Hangman2
4 Replies

8. Linux

Invalid Character

Hi, I am using a Perl script to generate a report file in Linux server. When my input data contains an invalid character which looks like hyphen after that my program is printing junk values in the report. Why that symbol is causing issue and is there a way to tell the server that this is a valid... (1 Reply)
Discussion started by: lawrance_ps
1 Replies

9. Shell Programming and Scripting

Find Unicode Character in File

I have a very large file in Unix that I would like to search for all instances of the unicode character 0x17. I need to remove these characters because the character is causing my SAX Parser to throw an exception. Does anyone know how to find a unicode character in a file? Thank you for your... (1 Reply)
Discussion started by: azelinsk
1 Replies

10. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
Login or Register to Ask a Question