Issue with Keyboard or Char Encoding During Migration


 
Thread Tools Search this Thread
Top Forums Programming Issue with Keyboard or Char Encoding During Migration
# 1  
Old 04-28-2020
Yes, sure, I need time to think about this too. A lot of head scratching coming on.
# 2  
Old 04-28-2020
After this test, I may give this ruby gem a try as well:

File: README - Documentation for mojibake (1.1.2)

Quote:
Mojibake occurs in English most frequently due to misinterpreting and bad-transcoding between Windows-1252, ISO-8859-1, and UTF-8. This module provides a mojibake sequence to original character mapping table, and utility to recover mojibake'd text. Testing has been with English but other Latin based languages, where Windows-1252 is in the wild, should also benefit.
See also: GitHub - dekellum/mojibake: Recover mojibake text using a reverse-mapping table

See also: Clean Up Weird Characters in Database | Digging Into WordPress
# 3  
Old 04-28-2020
FYI existing old mysql dB

Code:
mysql> SELECT count(postid)  from post where pagetext like '%“%';
+---------------+
| count(postid) |
+---------------+
|            66 |
+---------------+
1 row in set (1.64 sec)

mysql> SELECT count(postid)  from post where pagetext like  '%†%';
+---------------+
| count(postid) |
+---------------+
|            14 |
+---------------+
1 row in set (1.68 sec)


mysql> SELECT count(postid)  from post where pagetext like '%â€%';                                                                                                                    
+---------------+
| count(postid) |
+---------------+
|            45 |
+---------------+
1 row in set (1.66 sec)

mysql> SELECT count(postid)  from post where pagetext like '%’%';                                                                                                                    
+---------------+
| count(postid) |
+---------------+
|           165 |
+---------------+
1 row in set (1.63 sec)

mysql> SELECT count(postid)  from post where pagetext like '%‘%';                                                                                                                    
+---------------+
| count(postid) |
+---------------+
|            38 |
+---------------+
1 row in set (1.69 sec)


mysql> SELECT count(postid)  from post where pagetext like '%•%';
+---------------+
| count(postid) |
+---------------+
|             4 |
+---------------+
1 row in set (1.70 sec)

mysql> SELECT count(postid)  from post where pagetext like '%…%';
+---------------+
| count(postid) |
+---------------+
|            23 |
+---------------+
1 row in set (1.68 sec)

Now that SELECT shows some goodies, maybe UPDATE on main DB ? Smilie
# 4  
Old 04-28-2020
Like this:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-124301-pmjpg
# 5  
Old 04-28-2020
This spam page (on new site) was full of those "odd diamond things"...

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-15521-pmjpg


Code:
https://community.unix.com/t/infraction-for-solosx-spammed-advertisements/271573

... but I got rid of them on old site:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-15705-pmjpg


Code:
https://www.unix.com/user-infractions/142431-infraction-solosx-spammed-advertisements.html

Using this mysql code in old forums against mysql DB (for some reason, had to repeat, but that is OK):

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-15933-pmjpg


Will run against staging mysql soon....
This User Gave Thanks to Neo For This Post:
# 6  
Old 04-28-2020
# 7  
Old 04-28-2020
This is all good... but I want to focus bottom up... only on the specific chars causing a problem in our DB.

That is what I am doing now ... finding the exact offending char and then finding the correct transform to cleanse it.

Please hold off on posting links unless the link contain solutions for the exact char we are having issues with (let's stick to bottoms up approach, not top down, for now).

I have all the chars we have found so far covered, so hold off (on these funny chars) until I get the various staging DBs synced.

Thanks.

Right now I have all the transforms I need based on what we have found so far. We can search for more in the next round. In other words, I know what the problem is. What we need is to find them and then fix them, from a bottoms up approach because I am not going to run any code which "transforms" problems we have not identified and tested. I do not want unintended consciences of running code and others transforms unless they solve a specific, clearly identified issue.

Will update soon.
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Solaris

View file encoding then change encoding.

Hi all!! Im using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . Im expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

2. Shell Programming and Scripting

AIX to RHEL migration - awk treating 0e[0-9]+ as 0 instead of string issue

Greetings Experts, We are migrating from AIX to RHEL Linux. I have created a script to verify and report the NULLs and SPACEs in the key columns and duplicates on key combination of "|" delimited set of big files. Following is the code that was successfully running in AIX. awk -F "|" 'BEGIN {... (5 Replies)
Discussion started by: chill3chee
5 Replies

3. AIX

AIX Migration issue with EMC ODM sets

Hi Experts , I want to start migrating our AIX 6.1 to AIX 7.1 . I am planning to use alt_disk_migration . Chris gibson has awesome documentation in the internet. However I am running into an issue with EMC odm filesets . So my current OS is AIX 6.1. and I have this : lslpp -l | grep EMC ... (7 Replies)
Discussion started by: JME2015
7 Replies

4. UNIX for Dummies Questions & Answers

Strange Keyboard and Mouse Issue

Hello All, PC: CuBox-i (*i.MX6) Mini-PC OS: openSUSE 13.1 (Bottle) (armv7hl) Kernel: 3.14.14-cubox-i # uname -a Linux CuBox-HQ 3.14.14-cubox-i #1 SMP Sat Sep 13 03:48:24 UTC 2014 armv7l armv7l armv7l GNU/LinuxSo I've been having this random issue happen on this PC where a few strange... (12 Replies)
Discussion started by: mrm5102
12 Replies

5. Solaris

Solaris 10 p2v migration issue

Hi All, We need to move Physical Solaris 10 system to Virtual Solaris 10(p2v). Both the servers having Solaris 10(Generic_147440-25) means physical server which we are going to move is having Solaris 10 and this physical server will be converted as a virtualserver on another physical server... (9 Replies)
Discussion started by: sb200
9 Replies

6. Shell Programming and Scripting

Encoding of a text issue

I created one file on windows system and is visible as : TestTable,INSERT,večilnin1ईगल受害者是第,2010-02-02 10:10:10.612447,137277,ईगल受害者是第večilnin!@#$%^&*()_+=-{}] But when send this file to unix system, the file is visible as : TestTable,INSERT,žvečilnin1ई-ल -害...是第,2010-02-02 ... (4 Replies)
Discussion started by: Shaishav Shah
4 Replies

7. UNIX for Dummies Questions & Answers

how2 get single char from keyboard w/o enter

I am writing a bash shell menu and would like to get a char immediately after a key is pressed. This script does not work but should give you an idea of what I am trying to do.... Thanks for the help #! /bin/bash ANSWER="" echo -en "Choose item...\n" until do $ANSWER = $STDIN ... (2 Replies)
Discussion started by: jwzumwalt
2 Replies
Login or Register to Ask a Question