Issue with Keyboard or Char Encoding During Migration


 
Thread Tools Search this Thread
Top Forums Programming Issue with Keyboard or Char Encoding During Migration
# 1  
Old 04-27-2020
Issue with Keyboard or Char Encoding During Migration

There is a minor issue lingering which we currently have no working solution.

For example, see this pagetext in the original (old) forum mysql DB (continued thats to hicksd8 for finding these and for looking into this interesting topic):

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-92934-amjpg


From original DB:

Code:
| Hi have two directory with below name in “/opt“

1-Source
2-Destination 

In “Source” directory there is a lot’s of files, with extensions (doc, docx , ppt, xls,...).
In “Destination” directory only pdf version of (doc, docx) files that exist in source stored.

Now I want to create script that use “diff” command check “source” and get list of only (doc, docx) files after that look for related pdf file in “Destination” if pdf version of (doc, docx) not exist in “destination” store list of them on a file.

E.g.

1-Source
[CODE]File1.doc
File2.docx
File3.doc
File4.ppt
File5.xls
File6.doc[/CODE]

2-Destination 
[CODE]File1.pdf
File3.pdf
[/CODE]

Expected result after run script is:
[CODE]File2.docx
File6.doc[/CODE]

Here is my script
[CODE]diff -r “/opt/source” “/opt/destination“[/CODE]


Any recommendation?
Thanks



UPDATE 
Follow below post and work like charm:

[CODE]comm -23 <(find dir1 -type f -exec bash -c 'basename "${0%.*}"' {} \; | sort) <(find dir2 -type f -exec bash -c 'basename "${0%.*}"' {} \; | sort)
test1[/CODE]

[url=https://unix.stackexchange.com/questions/178321/diff-two-directories-but-ignore-the-extensions]filenames - diff two directories, but ignore the extensions - Unix & Linux Stack Exchange[/url]

Here is the original post:

Code:
https://www.unix.com/unix-for-beginners-questions-and-answers/284088-compare-two-directory-find-differents.html

Here is the migrated post:

Code:
https://community.unix.com/t/compare-two-directory-and-find-differents/377962

Note that some of the strange chars in the original DB become the "unknown unicode char" \uFFFD in the new DB:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-93831-amjpg


This encoding issue is very noticeable in spam in our database (mostly from non-English speakers and countries).

We see this occasionally in the DB from other non-English speakers and do not have a perfect solution for this issue, so far.
# 2  
Old 04-28-2020
Here some clues about this:

mysql - Trouble with UTF-8 characters; what I see is not what I stored - Stack Overflow

https://stackoverflow.com/questions/...-utf-8characte

Some suggest code like this:


Code:
$final = str_replace("Â", "", $final);
$final = str_replace("’", "'", $final);
$final = str_replace("“", '"', $final);
$final = str_replace('â€', '-', $final);
$final = str_replace('â€', '"', $final);

I will try this on staging server today.

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-114448-amjpg


Anyone have any good or better ideas?

Note: Actually, this kinda' fun in a perverted kinda' way. LOL
These 2 Users Gave Thanks to Neo For This Post:
# 3  
Old 04-28-2020
After this test, I may give this ruby gem a try as well:

File: README - Documentation for mojibake (1.1.2)

Quote:
Mojibake occurs in English most frequently due to misinterpreting and bad-transcoding between Windows-1252, ISO-8859-1, and UTF-8. This module provides a mojibake sequence to original character mapping table, and utility to recover mojibake'd text. Testing has been with English but other Latin based languages, where Windows-1252 is in the wild, should also benefit.
See also: GitHub - dekellum/mojibake: Recover mojibake text using a reverse-mapping table

See also: Clean Up Weird Characters in Database | Digging Into WordPress
# 4  
Old 04-28-2020
FYI existing old mysql dB

Code:
mysql> SELECT count(postid)  from post where pagetext like '%“%';
+---------------+
| count(postid) |
+---------------+
|            66 |
+---------------+
1 row in set (1.64 sec)

mysql> SELECT count(postid)  from post where pagetext like  '%†%';
+---------------+
| count(postid) |
+---------------+
|            14 |
+---------------+
1 row in set (1.68 sec)


mysql> SELECT count(postid)  from post where pagetext like '%â€%';                                                                                                                    
+---------------+
| count(postid) |
+---------------+
|            45 |
+---------------+
1 row in set (1.66 sec)

mysql> SELECT count(postid)  from post where pagetext like '%’%';                                                                                                                    
+---------------+
| count(postid) |
+---------------+
|           165 |
+---------------+
1 row in set (1.63 sec)

mysql> SELECT count(postid)  from post where pagetext like '%‘%';                                                                                                                    
+---------------+
| count(postid) |
+---------------+
|            38 |
+---------------+
1 row in set (1.69 sec)


mysql> SELECT count(postid)  from post where pagetext like '%•%';
+---------------+
| count(postid) |
+---------------+
|             4 |
+---------------+
1 row in set (1.70 sec)

mysql> SELECT count(postid)  from post where pagetext like '%…%';
+---------------+
| count(postid) |
+---------------+
|            23 |
+---------------+
1 row in set (1.68 sec)

Now that SELECT shows some goodies, maybe UPDATE on main DB ? Smilie
# 5  
Old 04-28-2020
Like this:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-124301-pmjpg
# 6  
Old 04-28-2020
This spam page (on new site) was full of those "odd diamond things"...

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-15521-pmjpg


Code:
https://community.unix.com/t/infraction-for-solosx-spammed-advertisements/271573

... but I got rid of them on old site:

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-15705-pmjpg


Code:
https://www.unix.com/user-infractions/142431-infraction-solosx-spammed-advertisements.html

Using this mysql code in old forums against mysql DB (for some reason, had to repeat, but that is OK):

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-15933-pmjpg


Will run against staging mysql soon....
This User Gave Thanks to Neo For This Post:
# 7  
Old 04-28-2020
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. Solaris

View file encoding then change encoding.

Hi all!! Im using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . Im expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you! (2 Replies)
Discussion started by: mrreds
2 Replies

2. Shell Programming and Scripting

AIX to RHEL migration - awk treating 0e[0-9]+ as 0 instead of string issue

Greetings Experts, We are migrating from AIX to RHEL Linux. I have created a script to verify and report the NULLs and SPACEs in the key columns and duplicates on key combination of "|" delimited set of big files. Following is the code that was successfully running in AIX. awk -F "|" 'BEGIN {... (5 Replies)
Discussion started by: chill3chee
5 Replies

3. AIX

AIX Migration issue with EMC ODM sets

Hi Experts , I want to start migrating our AIX 6.1 to AIX 7.1 . I am planning to use alt_disk_migration . Chris gibson has awesome documentation in the internet. However I am running into an issue with EMC odm filesets . So my current OS is AIX 6.1. and I have this : lslpp -l | grep EMC ... (7 Replies)
Discussion started by: JME2015
7 Replies

4. UNIX for Dummies Questions & Answers

Strange Keyboard and Mouse Issue

Hello All, PC: CuBox-i (*i.MX6) Mini-PC OS: openSUSE 13.1 (Bottle) (armv7hl) Kernel: 3.14.14-cubox-i # uname -a Linux CuBox-HQ 3.14.14-cubox-i #1 SMP Sat Sep 13 03:48:24 UTC 2014 armv7l armv7l armv7l GNU/LinuxSo I've been having this random issue happen on this PC where a few strange... (12 Replies)
Discussion started by: mrm5102
12 Replies

5. Solaris

Solaris 10 p2v migration issue

Hi All, We need to move Physical Solaris 10 system to Virtual Solaris 10(p2v). Both the servers having Solaris 10(Generic_147440-25) means physical server which we are going to move is having Solaris 10 and this physical server will be converted as a virtualserver on another physical server... (9 Replies)
Discussion started by: sb200
9 Replies

6. Shell Programming and Scripting

Encoding of a text issue

I created one file on windows system and is visible as : TestTable,INSERT,večilnin1ईगल受害者是第,2010-02-02 10:10:10.612447,137277,ईगल受害者是第večilnin!@#$%^&*()_+=-{}] But when send this file to unix system, the file is visible as : TestTable,INSERT,žvečilnin1ई-ल -害...是第,2010-02-02 ... (4 Replies)
Discussion started by: Shaishav Shah
4 Replies

7. UNIX for Dummies Questions & Answers

how2 get single char from keyboard w/o enter

I am writing a bash shell menu and would like to get a char immediately after a key is pressed. This script does not work but should give you an idea of what I am trying to do.... Thanks for the help #! /bin/bash ANSWER="" echo -en "Choose item...\n" until do $ANSWER = $STDIN ... (2 Replies)
Discussion started by: jwzumwalt
2 Replies
Login or Register to Ask a Question