Issue with Keyboard or Char Encoding During Migration

Thread Tools Search this Thread
Top Forums Programming Issue with Keyboard or Char Encoding During Migration
# 8  
Old 04-28-2020
Originally Posted by Neo
This Ruby code did not match... however running the search and replace mysql query directly against the DB seems to work (step by step).

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-22126-pmjpg

Testing now on staging server.... if it looks OK, will use this to restore community for further testing.
# 9  
Old 04-28-2020
This is all good... but I want to focus bottom up... only on the specific chars causing a problem in our DB.

That is what I am doing now ... finding the exact offending char and then finding the correct transform to cleanse it.

Please hold off on posting links unless the link contain solutions for the exact char we are having issues with (let's stick to bottoms up approach, not top down, for now).

I have all the chars we have found so far covered, so hold off (on these funny chars) until I get the various staging DBs synced.


Right now I have all the transforms I need based on what we have found so far. We can search for more in the next round. In other words, I know what the problem is. What we need is to find them and then fix them, from a bottoms up approach because I am not going to run any code which "transforms" problems we have not identified and tested. I do not want unintended consciences of running code and others transforms unless they solve a specific, clearly identified issue.

Will update soon.
# 10  
Old 04-28-2020
Need to remap ASCII characters to Unicode?
# 11  
Old 04-28-2020
Originally Posted by hicksd8
Need to remap ASCII characters to Unicode?
No. It's not that simple. It it was that simple, there would be no issue now. (The migration script already does encoding mapping from day 1. The DB is already UNICODE... the Ruby script already does encoding mapping. it is not so simple as a "general remap" or it would be done already.)

Let's stick with the plan. Find links with problems. I know how to fix these if everyone will follow my original plan and provide specific links with specific issues (original v. migrated posts).

What I need are EXACT examples of the real problem in OUR DB (not theory). Thanks. The links I posted were directly related to the EXACT char problem I am working today (right now). I used that code as a basis to address directly problems you guys found.

We want to work this bottoms up. Bottoms up means to find the exact issues (not theory) and fix the exact coding issue for each encoding issue.

Please. I'm busy and need to get this done the way I know will work. The only way to get this done correctly and surely is bottoms up. Not top down theory and speculation.


What I need from testers, in this thread is ORIGINAL versus MIGRATED posts examples. I can take care of the rest (finding the encoding in the DB, finding the correct transform, writing the code, running it, testing it in the DB, etc). Please keep on track looking for issues. That is the best way to help get this done.

Everything we have identified so far, I already have a solution for, and tested it and it works.

What I need are more examples of any error, anomaly or other data migration integrity issue, in two links (the original post and the migrated post).
This User Gave Thanks to Neo For This Post:
# 12  
Old 04-28-2020
Here is the simple version.

I we have a post full of ORIGINAL v. MIGRATE threads, it is easy for me to compare, come up with code, test and retest.

Without the links, or links scattered all over the place (email, whats app messages, carrier pigeon), it is hard for me to go back and test and it take me too much time because there is a great amount of work to do.

This is why I called for testing exactly as I did in my first call for testing:

Here is an image of what we need, from my first post on this caper:

Please Help Integrity Test New Discourse Forums V2


Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-31655-pmjpg
# 13  
Old 04-28-2020
While I am doing another test run, let me try to explain this better.

Our DB is nearly 15 years old.

People have copy-and-paste any kinds of encoding into the database. That stuff may have or may not have been transform to the encoding of the DB. In addition, over the years, the coding of the DB has changed. It was not UNICODE in the beginning.

The same is true for keyboards. People type from all kinds of keyboard over the years. Sometimes this adds to the problem of encoding, but generally it is from copy-and-post, from what I have seen. Many people like to write their post on their desktop editor and copy and paste that into the forums.

So, running any generic encoding translation will not work for all encodings. If it was, this problem would have already been solved. Sometimes UNICODE does not work because there are encoded chars with are not part of UNICODE.

It's not a theory. It is a fact of years of having a busy forum with people all over the world copy-and-pasting their locally encoded text into our DB. Sometime we get lucky and the encoding works.

All we can do, is identify it and squash it, or ignore it.

It's not critical either, because I can fix it after migration directly in the DB, as I have been doing today. But the best place to fix it is in the legacy mysql DB when possible but it is also doable if information was not lost in migration from mysql to postgres to do it in postgres.

This is why I am kinda begging everyone to help test. I can write the code to fix the issue if I clearly see the issues. There are one million posts. The more people take a look, the more it helps.

Sorry to be begging... LOL. I have been working on this for months. My wife is starting to feel like she has no husband; which I can understand why.

But I wanted everyone to understand why I have asked for this help.

This is exactly what I need..... (image from first post on this test)

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-31655-pmjpg


Honestly, so far people have provided me a total of about 3 or 4 links only where this encoding issue comes up and most of those are in non-public spam archives.

I don't want to be spending my time chasing outliers in two decades of encoding. Either there are issues or not. I am not going to spend my entire life working on chasing unimportant encoding issues to try to make a migration which s 99.99% perfect to 99.9999% perfect. It's not a good use of our time.

So, please provide details accounts of any remain encoding issues with links to the original and the migrated version.

# 14  
Old 04-28-2020
Here is one, but the issue is in the original DB.

Retry Logic But In Cron - UNIX for Beginners Questions & Answers - UNIX.COM Community

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-45232-pmjpg

In the mysql DB:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-44927-pmjpg

So, no reason to waste time on encoding issues which are not migration issues.

Retry Logic But In Cron

This illustrates the problem, chasing error in the original DB which migration as they were posts.

This is why I need the ORIGINALS and the MIGRATED versions if anyone sees any issue.

However, if anyone knows the correct replacement for that strange stuff, I will add it to the translation.
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Solaris

View file encoding then change encoding.

Hi all!! Im using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . Im expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

2. Shell Programming and Scripting

AIX to RHEL migration - awk treating 0e[0-9]+ as 0 instead of string issue

Greetings Experts, We are migrating from AIX to RHEL Linux. I have created a script to verify and report the NULLs and SPACEs in the key columns and duplicates on key combination of "|" delimited set of big files. Following is the code that was successfully running in AIX. awk -F "|" 'BEGIN {... (5 Replies)
Discussion started by: chill3chee
5 Replies

3. AIX

AIX Migration issue with EMC ODM sets

Hi Experts , I want to start migrating our AIX 6.1 to AIX 7.1 . I am planning to use alt_disk_migration . Chris gibson has awesome documentation in the internet. However I am running into an issue with EMC odm filesets . So my current OS is AIX 6.1. and I have this : lslpp -l | grep EMC ... (7 Replies)
Discussion started by: JME2015
7 Replies

4. UNIX for Dummies Questions & Answers

Strange Keyboard and Mouse Issue

Hello All, PC: CuBox-i (*i.MX6) Mini-PC OS: openSUSE 13.1 (Bottle) (armv7hl) Kernel: 3.14.14-cubox-i # uname -a Linux CuBox-HQ 3.14.14-cubox-i #1 SMP Sat Sep 13 03:48:24 UTC 2014 armv7l armv7l armv7l GNU/LinuxSo I've been having this random issue happen on this PC where a few strange... (12 Replies)
Discussion started by: mrm5102
12 Replies

5. Solaris

Solaris 10 p2v migration issue

Hi All, We need to move Physical Solaris 10 system to Virtual Solaris 10(p2v). Both the servers having Solaris 10(Generic_147440-25) means physical server which we are going to move is having Solaris 10 and this physical server will be converted as a virtualserver on another physical server... (9 Replies)
Discussion started by: sb200
9 Replies

6. Shell Programming and Scripting

Encoding of a text issue

I created one file on windows system and is visible as : TestTable,INSERT,večilnin1ईगल受害者是第,2010-02-02 10:10:10.612447,137277,ईगल受害者是第večilnin!@#$%^&*()_+=-{}] But when send this file to unix system, the file is visible as : TestTable,INSERT,žvečilnin1ई-ल -害...是第,2010-02-02 ... (4 Replies)
Discussion started by: Shaishav Shah
4 Replies

7. UNIX for Dummies Questions & Answers

how2 get single char from keyboard w/o enter

I am writing a bash shell menu and would like to get a char immediately after a key is pressed. This script does not work but should give you an idea of what I am trying to do.... Thanks for the help #! /bin/bash ANSWER="" echo -en "Choose item...\n" until do $ANSWER = $STDIN ... (2 Replies)
Discussion started by: jwzumwalt
2 Replies
Login or Register to Ask a Question

Featured Tech Videos