Visit Our UNIX and Linux User Community


Problem With UTF8 Byte Order Make


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Problem With UTF8 Byte Order Make
# 1  
Old 09-01-2013
Question Problem With UTF8 Byte Order Make

Hi

Im migrating a few websites from my old webserver (CentOS-5) to a new server (CentOS6) , one of these websites is multilingual and has a lot of utf8 files(html,php) with different languages (i.e arabic, persian, russian ,etc).

In old server when i do:
Code:
file mailer.php

I get :
Code:
mailer.php: UTF-8 Unicode C++ program text

But when i transfer these files to the new server and do the file command again i get this :
Code:
mailer.php UTF-8 Unicode (with BOM) C++ program text

And the files will display question marks "????????" when i browse the website!

What should i do to stop the OS to add BOM to these files ?

Moderator's Comments:
Mod Comment Please use CODE tags (not QUOTE tags) when displaying sample input and output as well as when displaying code segments.

Last edited by Don Cragun; 09-01-2013 at 07:05 PM.. Reason: Change QUOTE tags to CODE tags.
# 2  
Old 09-01-2013
Quote:
Originally Posted by mohs3n
Hi

Im migrating a few websites from my old webserver (CentOS-5) to a new server (CentOS6) , one of these websites is multilingual and has a lot of utf8 files(html,php) with different languages (i.e arabic, persian, russian ,etc).

In old server when i do:
Code:
file mailer.php

I get :
Code:
mailer.php: UTF-8 Unicode C++ program text

But when i transfer these files to the new server and do the file command again i get this :
Code:
mailer.php UTF-8 Unicode (with BOM) C++ program text

And the files will display question marks "????????" when i browse the website!

What should i do to stop the OS to add BOM to these files ?

Moderator's Comments:
Mod Comment Please use CODE tags (not QUOTE tags) when displaying sample input and output as well as when displaying code segments.
In all likelihood, the OS didn't change the files, it is just that the file utility was made smarter to distinguish between different types of C++ program text.

Look at /etc/magic on both systems and see if there is something containing the string "with BOM". It may help you understand why CentOS6 adds that phrase in the output from the file utility. However, since it is a programming language check, the rules file is using might not be in /etc/magic.
# 3  
Old 09-02-2013
Quote:
Originally Posted by Don Cragun
In all likelihood, the OS didn't change the files, it is just that the file utility was made smarter to distinguish between different types of C++ program text.

Look at /etc/magic on both systems and see if there is something containing the string "with BOM". It may help you understand why CentOS6 adds that phrase in the output from the file utility. However, since it is a programming language check, the rules file is using might not be in /etc/magic.
I've checked the OSs and only CentOS6 has /etc/magic file and there is nothing in there .

Any other suggestions would be appreciated Smilie
# 4  
Old 09-02-2013
On each of the two machines, calculate each file's md5 digest. Are they identical?

If not, then you need to specify exactly how you transferred them.

If they are identical, the problem lies elsewhere.

Simply mishandling a BOM is unlikely to generate a lot of "???????" sequences. I suspect that the browser is using an incorrect encoding (perhaps because either the webserver or php is not configured correctly). Have you compared the headers sent from each server? Or have you checked that in both cases the browser is using the same encoding?

Regards,
Alister

Last edited by alister; 09-02-2013 at 12:00 PM..
# 5  
Old 09-02-2013
Quote:
Originally Posted by alister
On each of the two machines, calculate each file's md5 digest. Are they identical?

If not, then you need to specify exactly how you transferred them.

If they are identical, the problem lies elsewhere.

Simply mishandling a BOM is unlikely to generate a lot of "???????" sequences. I suspect that the browser is using an incorrect encoding (perhaps because either the webserver or php is not configured correctly). Have you compared the headers sent from each server? Or have you checked that in both cases the browser is using the same encoding?

Regards,
Alister
CheckSum is the same and php/apache configurations are also the same in both servers.
# 6  
Old 09-02-2013
Are you using the exact same browser (same user account, same version, same computer) for both sites? Did you check the http headers that the browser is receiving? Did you check the encoding that the browser is using in each case? Did you try forcing the encoding in the browser to see if the page's multi-lingual text renders correctly?

Regards,
Alister
# 7  
Old 09-02-2013
Problem Solved .

This is how :
I just found out the files are BOM in both servers but the old server (CentOS 5) doesn't show it because of the old "file" rpm package that doesn't have "BOM" in it's magic file "/usr/local/share/file/magic" but in new server (CentOS 6) with new "file" package BOM is included and it could detect it!
The reason why i got question marks in the output was because of a php option called "zend.multibyte" which suppose to handle files with BOM but it want! so i recompiled php without this option and everything back to normal .
This User Gave Thanks to mohs3n For This Post:

Previous Thread | Next Thread
Test Your Knowledge in Computers #554
Difficulty: Medium
Using printf() in C, %d formats a decimal variable or literal.
True or False?

10 More Discussions You Might Find Interesting

1. Debian

Locales UTF8 - not working

Hello, I'm facing a strange problem in one of my Debian server, what is happening right now it that I have runned dpkg-reconfigure locales to set en_US UTF-8 so in that way I could use accentuation in my system. # locale -a C en_US.utf8 POSIX pt_BR.utf8 However, when I create a new... (12 Replies)
Discussion started by: pxb368@motorola
12 Replies

2. Shell Programming and Scripting

Help with sort data based on descending order problem

Input file 9.99331e-13 8.98451e-65 9.98418e-34 7.98319e-08 365592 111669 74942.9 0 Desired output 365592 111669 74942.9 7.98319e-08 1.99331e-13 6.98418e-34 (2 Replies)
Discussion started by: perl_beginner
2 Replies

3. Shell Programming and Scripting

redirect stdout and stderr to file wrong order problem with subshell

Hello I read a lot of post related to this topic, but nothing helped me. :mad: I'm running a ksh script with subshell what processing some ldap command. I need to check output for possible errors. #!/bin/ksh ... readinput < $QCHAT_INPUT |& while read -p line do echo $line ... (3 Replies)
Discussion started by: Osim
3 Replies

4. Programming

How to use sigmask in order to make signals can be processed by a thread

Hi, I have a UDP server and client program, and they must run within a program, so I decided two threads, one for UDP server and another for UDP client. The simple architecture is shown in attachment. However, I can't send the packets out on the UDP client, no any time message and... (2 Replies)
Discussion started by: sehang
2 Replies

5. Shell Programming and Scripting

UTF8 encoding

Hi experts, I have a gz file from other system(solaris), which is ftped to our system(solaris). After gunzip, the file is a xml file and we are using ORACLE built in xml transformiing tool ORAXSL to transform XML to TXT. Now the issue is we come accross issue regarding UTF8 as below:... (1 Reply)
Discussion started by: summer_cherry
1 Replies

6. Programming

Byte order question

Hi, The structure that will follow is supposed to hold the following RTP header field 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... (3 Replies)
Discussion started by: emitrax
3 Replies

7. Shell Programming and Scripting

Remove a byte(Last byte from the last line)

Hi All Can anyone please suggest me how to remove the last byte from a falt file .This is from the last line's last BYTE. Please suggest me something. Thank's and regards Vinay (1 Reply)
Discussion started by: vinayrao
1 Replies

8. Shell Programming and Scripting

Check if 2 files are identical byte-to-byte?

In my server migration requirement, I need to compare if one file on old server is exactly the same as the corresponding file on the new server. For diff and comm, the inputs need to be sorted. But I do not want to disturb the content of the file and need to find byte-to-byte match. Please... (4 Replies)
Discussion started by: krishmaths
4 Replies

9. Shell Programming and Scripting

problem with 0 byte and large files

how to remove all zero byte files in a particular directory and also files that are morew than 1GB. pLEASE let me know (3 Replies)
Discussion started by: dsravan
3 Replies

10. UNIX for Advanced & Expert Users

Utf8-utf16

Hi All, When we create a flat file using a PLSQL program , the flat file is being created in UTF8 format.This file has lot of german characters.When we use this file to load data into MS SQL Server, the german characters are coming as junk. When we create a flat file in oracle it is being ... (1 Reply)
Discussion started by: Suppandi
1 Replies

Featured Tech Videos