XML file shows Junk Characters in UNIX


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers XML file shows Junk Characters in UNIX
# 1  
Old 11-19-2010
XML file shows Junk Characters in UNIX

Hello sir,

I have generated XML file from VS 2005. It works well in windows but it shows some junk characters in unix.
Can any help me with this problem.

Thank you in advance.
Hema
# 2  
Old 11-19-2010
Could you please tells us more or show a sample ?

Code:
dos2unix yourfile yourfile

# 3  
Old 11-19-2010
issue in detail

Here is the snippet from my xml file. This is the content showed in the xml file when I open it in Windows.

<DESCRIPTION-OF-GOODS>3VL9300-3HQ00</DESCRIPTION-OF-GOODS>
<DESCRIPTION-OF-GOODS>1SE0106-4YA80</DESCRIPTION-OF-GOODS>
<DESCRIPTION-OF-GOODS>84/.3 BC 6.0 X 4C CABLE</DESCRIPTION-OF-GOODS>

Whereas if I open this same file in UNIX, I could see some junk characters like:

<DESCRIPTION-OF-GOODS><![3VL9300-3HQ00 ></DESCRIPTION-OF-GOODS>
<DESCRIPTION-OF-GOODS><!1SE0131-2YA80]></DESCRIPTION-OF-GOODS>
<DESCRIPTION-OF-GOODS><84/.3 BC 6.0 X 4C CABLE></DESCRIPTION-OF-GOODS>

There are lots of <DESCRIPTION-OF-GOODS> tags in the file. Only in very few lines I could see these kind of junk characters and hence are not getting parsed properly. And also, I have lots of similar .xml files. Only 2 files have this issue. I have generated all the xml files using the same VS 2005 application.

Please help us to identify the issue.

Thanks
# 4  
Old 11-19-2010
Code:
sed 's/>[<!\[]*/>/;s/ *]*></</' input >ouput

Code:
$ cat input
<DESCRIPTION-OF-GOODS><![3VL9300-3HQ00 ></DESCRIPTION-OF-GOODS>
<DESCRIPTION-OF-GOODS><!1SE0131-2YA80]></DESCRIPTION-OF-GOODS>
<DESCRIPTION-OF-GOODS><84/.3 BC 6.0 X 4C CABLE></DESCRIPTION-OF-GOODS>
$ sed 's/>[<!\[]*/>/;s/ *]*></</' input
<DESCRIPTION-OF-GOODS>3VL9300-3HQ00</DESCRIPTION-OF-GOODS>
<DESCRIPTION-OF-GOODS>1SE0131-2YA80</DESCRIPTION-OF-GOODS>
<DESCRIPTION-OF-GOODS>84/.3 BC 6.0 X 4C CABLE</DESCRIPTION-OF-GOODS>
$

# 5  
Old 11-19-2010
Thanks for your response again.

Though I did not understand your reply completely, I think you have send us the commands that I can use to eliminate those special characters in the XML file.

Means, I need to identified the infected files and run these commands on those files and then use it for further processing. Please correct me if I'm wrong.

But, I'm really interested in knowing the root cause for this problem. Could you please explain us, why this issue is occurring? Is there anything that I need to take care while generating these xml files? I'm confused because this issue is not common across all the files.. Only 2 out of hundreds of files (that I have created in the past) have this issue...

Thanks
# 6  
Old 11-19-2010
I have no idea how your files are generated.
So i don't know what is the root cause for this erroneous character to append.

Yes, the command i did post are to correct the file to make them look as you need it to.
BUT beware using it : this assumes that the rest of the file is structured the same way than the 3 lines you gave in your sample (having such kind of entries).
# 7  
Old 11-19-2010
Quote:
Originally Posted by hemavenkatesh
I have generated all the xml files using the same VS 2005 application.
I have yet to encounter a single Microsoft product capable of generating terse, well-formatted HTML or XML...
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Need to remove Junk characters

Hi All, I have a issue that we are getting Junk characters from source and i am not able to load that records to Database. Line breakers Junk Characters (Â and different every time) Japanese Characters Every time I am using grep command and awk -F "\007" to find them and delete that... (1 Reply)
Discussion started by: spradeep86
1 Replies

2. Shell Programming and Scripting

To check Blank Lines, Blank Records and Junk Characters in a File

Hi All Need Help I have a file with the below format (ABC.TXT) : ®¿¿ABCDHEJJSJJ|XCBJSKK01|M|7348974982790 HDFLJDKJSKJ|KJALKSD02|M|7378439274898 KJHSAJKHHJJ|LJDSAJKK03|F|9898982039999 (cont......) I need to write a script where it will check for : blank lines (between rows,before... (6 Replies)
Discussion started by: chatwithsaurav
6 Replies

3. Shell Programming and Scripting

Remove all junk characters from a text file

I am using flatfile, in that flat file we are getting the junk chars 1)I21001f<82>^Me<85>!h49 Service Charge 2) I21001f‚ e...!h49 Service Charge please tell me how to remove all junk chars in unix scripts. (1 Reply)
Discussion started by: Talari
1 Replies

4. Shell Programming and Scripting

Handling Junk Characters

Urgently ur help is needed. Actually my req is i have an input file, that input file may have junk characters (^M, ^Z) etc... eg: cat file name abc^Z addres name2 msdmskd^Z address2 I want to validate the record and display where exactly this junk character resides. I want to... (3 Replies)
Discussion started by: help_scr_seeker
3 Replies

5. UNIX for Dummies Questions & Answers

how to grep junk characters in a file

hi guys, I am generating a file from datastage (an etl tool). Now the file is having some junk characters like ( Á,L´±,ñ and so on).. I want to use the grep function to figure out all the junk characters and their location. Can somebody help me out in finding it out.. if possible i... (1 Reply)
Discussion started by: mac4rfree
1 Replies

6. Shell Programming and Scripting

Replacing junk characters

Hi, I have a file with data as given below $cat file1 123|abc|345 345|def|567 The first record is good record. The second record has an invisible junk character like \032. I was replace all the occurences of that invisible character with #. I want to do this for a set of... (16 Replies)
Discussion started by: ashwin3086
16 Replies

7. Shell Programming and Scripting

display all possible control characters from .xml file in unix

Hi, I have a .xml file in unix. We are passing this file through a xml parser. But we are getting some control characters from input file and XML parser is failing for the control character in file.Now I am getting following error, Error at byte 243206625 of file filename_$.xml: Error... (1 Reply)
Discussion started by: fantushmayu
1 Replies

8. Shell Programming and Scripting

Reading a file having junk characters in perl

Can anyone tell me how to read a file in perl having junk characters . I have only one junk character which is repeated many times in the file. While i'm reading and printing the file , it is displaying till the 1st occurence of that junk character and rest of the file is not being read. (1 Reply)
Discussion started by: k_surya
1 Replies

9. Solaris

Junk characters in file not in Solaris, but visible in linux

Hello All, I have a DOS file which I run a DOS 2 UNIX utility on. When run from Solaris, I can view the file perfectly. But, when run from linux, I see a bunch of junk(^@) at the beginning of every line in the file. Does anyone know the cause of this? COMMAND TO CONVERT: tr -d '\015\032'... (7 Replies)
Discussion started by: vada010
7 Replies

10. Shell Programming and Scripting

Identify records having junk characters in unix

Hi Friends, I need to have a command in Unix which output all teh records havingg junk characters in a file.... I know a command cat -tv <Filename> which opens the file and we can check for any junk character in it. But my requirement is to fetch ONLY THOSE records having junk characters.... (6 Replies)
Discussion started by: sureshg_sampat
6 Replies
Login or Register to Ask a Question