Issue with Keyboard or Char Encoding During Migration


 
Thread Tools Search this Thread
Top Forums Programming Issue with Keyboard or Char Encoding During Migration
# 22  
Old 04-28-2020
Here ya go.....

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-65612-pmjpg


This is an encoding issue with an old post in the PHP vB3 app.

The encoding of the title DB field from years ago is not specified because the designers of vB 15+ years ago did not think people would post such non-standard chars in titles of posts.

It is correct in discourse, in this case, and wrong in vB3, which is very interesting.

find . -name “*.*” | xargs grep "help" - UNIX for Dummies Questions & Answers - UNIX.COM Community

Isn't that how you see it?

It looks flawless in the new post in my browser.
This User Gave Thanks to Neo For This Post:
# 23  
Old 04-28-2020
Yes, I see it correctly on the new site but I see replacement character symbol instead of the double quote marks on the old site.

What title do you see on the old site?
This User Gave Thanks to hicksd8 For This Post:
# 24  
Old 04-28-2020
OBTW, if I add this to the showthread PHP script:

Code:
$thread['title'] = utf8_encode($thread['title']);

(which I just did for fun to the old forum).

The issue is the same.

I think it is an HTML encoding issue in the vB app. If you look at the source code of the page in old forum:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-71156-pmjpg


So, there is some underlying issue with vBulletin3 which is on reason we are migrating, to get away from this way past EOL forum software.

If I add this to the showthread script in old forum:

Code:
header('Content-Type: text/html; charset=utf-8');

This does not help on old forums either:

Code:
header('Content-Type: text/html; charset=utf-16');

The issue is the same because of some coding mismatch with the same "non-standard" mojibake.

Mojibake - Wikipedia

Dennis, are you getting bored with this mojibake stuff yet?

Actually, I'm feeling very good the only issues you are finding out outlier encoding issues which are broken in the old, legacy, obsolete, long EOL, forum, at this point.
# 25  
Old 04-28-2020
Quote:
Originally Posted by hicksd8
Yes, I see it correctly on the new site but I see replacement character symbol instead of the double quote marks on the old site.

What title do you see on the old site?
Old site title is broken.... due to encoding issue:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-72601-pmjpg


New site, title is looking good:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-72722-pmjpg


If we edit the title on the old site and replace the mojibake with regular double quotes, all will be OK.

But honestly, I would not worry about it; but of course we can if we want. It's been like this for nearly a decade and no one said anything about it before Smilie
# 26  
Old 04-28-2020
Yes, I'm getting bored with this.

What you are saying is that when I see

Quote:
"ABCDEFG"
on the old forum, there is nevertheless, something wrong with, say, the second quote mark which after migration becomes

Quote:
"ABCDEFG<placeholder>
If so, then what is wrong with the second quote mark (Unicode value) in the old site???

I just don't understand how it can be that wrong and still display perfectly on my screen (unless it's 8bit value vs 7bit value or some such).

Otherwise, if it displays perfect then it should migrate perfect. Yes?
# 27  
Old 04-28-2020
Code:
If so, then what is wrong with the second quote mark (Unicode value) in the old site???

Those quote marks are in an encoding not processed by the PHP / HTML as proper "UTF-8" in this legacy vBulletin LAMP application, and so it replaces it with the "WTF?" mojibake symbol.

When you look on the old site, you are seeing encoding processed by PHP based on the legacy PHP encoding to HTML.

The new site does this totally different, that is why it displays properly over there in communityville.

If you edit the old title and replace those oddly-encoded chars with the same quotes as on your keyboard the encoding will change, all will be great again and the world will be as one Smilie
# 28  
Old 04-28-2020
I just edited the old title.... using the double quotes on my key board.

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-75157-pmjpg


This is more-than-likely not about 7 / 8 bit ASCII, it is more-than-likely about UTF-8 and UTF-16.


See also: What is the difference between UTF-8 and UTF-16? - Quora

See also: Comparison of Unicode encodings - Wikipedia
This User Gave Thanks to Neo For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Solaris

View file encoding then change encoding.

Hi all!! Im using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . Im expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

2. Shell Programming and Scripting

AIX to RHEL migration - awk treating 0e[0-9]+ as 0 instead of string issue

Greetings Experts, We are migrating from AIX to RHEL Linux. I have created a script to verify and report the NULLs and SPACEs in the key columns and duplicates on key combination of "|" delimited set of big files. Following is the code that was successfully running in AIX. awk -F "|" 'BEGIN {... (5 Replies)
Discussion started by: chill3chee
5 Replies

3. AIX

AIX Migration issue with EMC ODM sets

Hi Experts , I want to start migrating our AIX 6.1 to AIX 7.1 . I am planning to use alt_disk_migration . Chris gibson has awesome documentation in the internet. However I am running into an issue with EMC odm filesets . So my current OS is AIX 6.1. and I have this : lslpp -l | grep EMC ... (7 Replies)
Discussion started by: JME2015
7 Replies

4. UNIX for Dummies Questions & Answers

Strange Keyboard and Mouse Issue

Hello All, PC: CuBox-i (*i.MX6) Mini-PC OS: openSUSE 13.1 (Bottle) (armv7hl) Kernel: 3.14.14-cubox-i # uname -a Linux CuBox-HQ 3.14.14-cubox-i #1 SMP Sat Sep 13 03:48:24 UTC 2014 armv7l armv7l armv7l GNU/LinuxSo I've been having this random issue happen on this PC where a few strange... (12 Replies)
Discussion started by: mrm5102
12 Replies

5. Solaris

Solaris 10 p2v migration issue

Hi All, We need to move Physical Solaris 10 system to Virtual Solaris 10(p2v). Both the servers having Solaris 10(Generic_147440-25) means physical server which we are going to move is having Solaris 10 and this physical server will be converted as a virtualserver on another physical server... (9 Replies)
Discussion started by: sb200
9 Replies

6. Shell Programming and Scripting

Encoding of a text issue

I created one file on windows system and is visible as : TestTable,INSERT,večilnin1ईगल受害者是第,2010-02-02 10:10:10.612447,137277,ईगल受害者是第večilnin!@#$%^&*()_+=-{}] But when send this file to unix system, the file is visible as : TestTable,INSERT,žvečilnin1ई-ल -害...是第,2010-02-02 ... (4 Replies)
Discussion started by: Shaishav Shah
4 Replies

7. UNIX for Dummies Questions & Answers

how2 get single char from keyboard w/o enter

I am writing a bash shell menu and would like to get a char immediately after a key is pressed. This script does not work but should give you an idea of what I am trying to do.... Thanks for the help #! /bin/bash ANSWER="" echo -en "Choose item...\n" until do $ANSWER = $STDIN ... (2 Replies)
Discussion started by: jwzumwalt
2 Replies
Login or Register to Ask a Question