Issue with Keyboard or Char Encoding During Migration


 
Thread Tools Search this Thread
Top Forums Programming Issue with Keyboard or Char Encoding During Migration
# 15  
Old 04-28-2020
Hi All,

As Neo says I have been spending a bit of time on this migration integrity issue.

The irritating "Thingy" (white diamond with question mark in the middle) is officially the Unicode symbol called "Replacement character". The character set inserts this as a placeholder for a character that it doesn't understand. IMHO, the issue here is simply that the migration script (or whatever process) SHOULD understand all the characters on our old site. Yes, we already have "Replacement characters" on the old site switch probably emanated from a long ago upgrade from ascii to Unicode, or from Unicode version x to Unicode version y. As Neo says, replacement character symbols in our old site must be ignored because there's nothing we can do about them now apart from manually edit them out as time goes on.

However, I believe that the currently used (Discourse provided??) process is stuffed because it doesn't understand some of the perfectly correct text on our old site. It even screws up a thread title on the old site containing the replacement character symbol - look at this......

Post migration
How to grep i?1/2 symbol? - Shell Programming and Scripting - UNIX.COM Community

Pre migration
How to grep � symbol?

So the process doesn't even understand it's own Unicode character set!!!!

So FWIW, I've come to the conclusion that trying to modify our old dB is futile as the process will probably find something else to screw up.

Indeed, if you follow the first link I posted on this thread further back, others are having the same issue.

That's my update thus far. I'll report back again as my investigation continues.

EDIT: Replacement character symbol is U+FFFD
# 16  
Old 04-28-2020
Quote:
Originally Posted by hicksd8

So the process doesn't even understand it's own Unicode character set!!!!
What are you talking about?

The migrated versus is the same as the original version.

That is exactly how it should be.

This has nothing at all to do with migration, discourse or translating encoding.

Migrating a post where the encoding has already been replaced in the original DB as "a question replacement char" will keep the same "question replacement char" .

Indeed, are you working too hard? Smilie

These example posts you just posted seem "perfect" to me. The migration is just like the original. The original has already replaced unknown encoding (you do not know if the original encoding was unicode or not, that information is gone from existence in the original post and has been replaced with the "wft" encoding symbol.

Smilie
# 17  
Old 04-28-2020
Whoa, hang on a minute........

I'm talking about the thread title. Is your migrated thread title reading correctly? If so, that's news.

My thread title shown is screwed.
# 18  
Old 04-28-2020
Yes, the thread title in the migrated post is different because there is no migration script running against titles.

That's not a big deal, it's a total outlier.

We are not processing "thread titles" in the migration (on migration scripts do) because the migration scripts are not designed to process any titles at all.

All processing is for the posts only.

Let's don't go chasing down outlier rabbit holes where we are not even concerned about.

It takes less time to edit an outlier like that than to discuss it, LOL.

Obviously, is a thread title has a char which is not encoded properly, then it will have issues when migrated. That is like some 0.00001 kinda outlier.

It's good you found it so we can edit it later; but it's nothing to be concerned about.

So, to be clear... the migration script does not do any processing on titles. Titles are expected to be written in "basic normal charsets" and when they are not, it is quite the outlier case.
# 19  
Old 04-28-2020
And look at this.............

Old site
find . -name �*.*� | xargs grep "help"

New site
find . -name “*.*” | xargs grep "help" - UNIX for Dummies Questions & Answers - UNIX.COM Community

The migration has taken out the replacement character placeholders and put in double quotes which, I wouldn't mind betting, is what was originally there.

What? Anybody got a good working crystal ball?
# 20  
Old 04-28-2020
Let me check the original DB and post back..

Hold on.
# 21  
Old 04-28-2020
Yes, sure, I need time to think about this too. A lot of head scratching coming on.
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Solaris

View file encoding then change encoding.

Hi all!! Im using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . Im expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

2. Shell Programming and Scripting

AIX to RHEL migration - awk treating 0e[0-9]+ as 0 instead of string issue

Greetings Experts, We are migrating from AIX to RHEL Linux. I have created a script to verify and report the NULLs and SPACEs in the key columns and duplicates on key combination of "|" delimited set of big files. Following is the code that was successfully running in AIX. awk -F "|" 'BEGIN {... (5 Replies)
Discussion started by: chill3chee
5 Replies

3. AIX

AIX Migration issue with EMC ODM sets

Hi Experts , I want to start migrating our AIX 6.1 to AIX 7.1 . I am planning to use alt_disk_migration . Chris gibson has awesome documentation in the internet. However I am running into an issue with EMC odm filesets . So my current OS is AIX 6.1. and I have this : lslpp -l | grep EMC ... (7 Replies)
Discussion started by: JME2015
7 Replies

4. UNIX for Dummies Questions & Answers

Strange Keyboard and Mouse Issue

Hello All, PC: CuBox-i (*i.MX6) Mini-PC OS: openSUSE 13.1 (Bottle) (armv7hl) Kernel: 3.14.14-cubox-i # uname -a Linux CuBox-HQ 3.14.14-cubox-i #1 SMP Sat Sep 13 03:48:24 UTC 2014 armv7l armv7l armv7l GNU/LinuxSo I've been having this random issue happen on this PC where a few strange... (12 Replies)
Discussion started by: mrm5102
12 Replies

5. Solaris

Solaris 10 p2v migration issue

Hi All, We need to move Physical Solaris 10 system to Virtual Solaris 10(p2v). Both the servers having Solaris 10(Generic_147440-25) means physical server which we are going to move is having Solaris 10 and this physical server will be converted as a virtualserver on another physical server... (9 Replies)
Discussion started by: sb200
9 Replies

6. Shell Programming and Scripting

Encoding of a text issue

I created one file on windows system and is visible as : TestTable,INSERT,večilnin1ईगल受害者是第,2010-02-02 10:10:10.612447,137277,ईगल受害者是第večilnin!@#$%^&*()_+=-{}] But when send this file to unix system, the file is visible as : TestTable,INSERT,žvečilnin1ई-ल -害...是第,2010-02-02 ... (4 Replies)
Discussion started by: Shaishav Shah
4 Replies

7. UNIX for Dummies Questions & Answers

how2 get single char from keyboard w/o enter

I am writing a bash shell menu and would like to get a char immediately after a key is pressed. This script does not work but should give you an idea of what I am trying to do.... Thanks for the help #! /bin/bash ANSWER="" echo -en "Choose item...\n" until do $ANSWER = $STDIN ... (2 Replies)
Discussion started by: jwzumwalt
2 Replies
Login or Register to Ask a Question