Issue with Keyboard or Char Encoding During Migration Post: 303046223

Sponsored Content

Top Forums Programming Issue with Keyboard or Char Encoding During Migration Post 303046223 by Neo on Tuesday 28th of April 2020 04:42:58 AM

04-28-2020

Administrator

While I am doing another test run, let me try to explain this better.

Our DB is nearly 15 years old.

People have copy-and-paste any kinds of encoding into the database. That stuff may have or may not have been transform to the encoding of the DB. In addition, over the years, the coding of the DB has changed. It was not UNICODE in the beginning.

The same is true for keyboards. People type from all kinds of keyboard over the years. Sometimes this adds to the problem of encoding, but generally it is from copy-and-post, from what I have seen. Many people like to write their post on their desktop editor and copy and paste that into the forums.

So, running any generic encoding translation will not work for all encodings. If it was, this problem would have already been solved. Sometimes UNICODE does not work because there are encoded chars with are not part of UNICODE.

It's not a theory. It is a fact of years of having a busy forum with people all over the world copy-and-pasting their locally encoded text into our DB. Sometime we get lucky and the encoding works.

All we can do, is identify it and squash it, or ignore it.

It's not critical either, because I can fix it after migration directly in the DB, as I have been doing today. But the best place to fix it is in the legacy mysql DB when possible but it is also doable if information was not lost in migration from mysql to postgres to do it in postgres.

This is why I am kinda begging everyone to help test. I can write the code to fix the issue if I clearly see the issues. There are one million posts. The more people take a look, the more it helps.

Sorry to be begging... LOL. I have been working on this for months. My wife is starting to feel like she has no husband; which I can understand why.

But I wanted everyone to understand why I have asked for this help.

This is exactly what I need..... (image from first post on this test)

Issue with Keyboard or Char Encoding During Migration-screen-shot-2020-04-28-31655-pmjpg

-------------------------

Honestly, so far people have provided me a total of about 3 or 4 links only where this encoding issue comes up and most of those are in non-public spam archives.

I don't want to be spending my time chasing outliers in two decades of encoding. Either there are issues or not. I am not going to spend my entire life working on chasing unimportant encoding issues to try to make a migration which s 99.99% perfect to 99.9999% perfect. It's not a good use of our time.

So, please provide details accounts of any remain encoding issues with links to the original and the migrated version.

Thanks.

Neo

View Public Profile for Neo

Visit Neo's homepage!

Find all posts by Neo

7 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

how2 get single char from keyboard w/o enter

I am writing a bash shell menu and would like to get a char immediately after a key is pressed. This script does not work but should give you an idea of what I am trying to do.... Thanks for the help #! /bin/bash ANSWER="" echo -en "Choose item...\n" until do $ANSWER = $STDIN ...

2. Shell Programming and Scripting

Encoding of a text issue

I created one file on windows system and is visible as : TestTable,INSERT,�večilnin1ईगल受害者是第,2010-02-02 10:10:10.612447,137277,ईगल受害者是第�večilnin!@#$%^&*()_+=-{}] But when send this file to unix system, the file is visible as : TestTable,INSERT,žvečilnin1ई�-ल �-害�...是第,2010-02-02 ...

3. Solaris

Solaris 10 p2v migration issue

Hi All, We need to move Physical Solaris 10 system to Virtual Solaris 10(p2v). Both the servers having Solaris 10(Generic_147440-25) means physical server which we are going to move is having Solaris 10 and this physical server will be converted as a virtualserver on another physical server...

4. UNIX for Dummies Questions & Answers

Strange Keyboard and Mouse Issue

Hello All, PC: CuBox-i (*i.MX6) Mini-PC OS: openSUSE 13.1 (Bottle) (armv7hl) Kernel: 3.14.14-cubox-i # uname -a Linux CuBox-HQ 3.14.14-cubox-i #1 SMP Sat Sep 13 03:48:24 UTC 2014 armv7l armv7l armv7l GNU/LinuxSo I've been having this random issue happen on this PC where a few strange...

5. AIX

AIX Migration issue with EMC ODM sets

Hi Experts , I want to start migrating our AIX 6.1 to AIX 7.1 . I am planning to use alt_disk_migration . Chris gibson has awesome documentation in the internet. However I am running into an issue with EMC odm filesets . So my current OS is AIX 6.1. and I have this : lslpp -l | grep EMC ...

6. Shell Programming and Scripting

AIX to RHEL migration - awk treating 0e[0-9]+ as 0 instead of string issue

Greetings Experts, We are migrating from AIX to RHEL Linux. I have created a script to verify and report the NULLs and SPACEs in the key columns and duplicates on key combination of "|" delimited set of big files. Following is the code that was successfully running in AIX. awk -F "|" 'BEGIN {...

7. Solaris

View file encoding then change encoding.

Hi all!! I�m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I�m expecting / looking an output as UTF8 or ANSI / ASCII Is there command to display the files encoding? Thank you!

LEARN ABOUT MOJAVE

encoding

encoding(n)						       Tcl Built-In Commands						       encoding(n)

__________________________________________________________________________________________________________________________________________________

NAME

       encoding - Manipulate encodings

SYNOPSIS

       encoding option ?arg arg ...?
_________________________________________________________________

INTRODUCTION

       Strings	in Tcl are encoded using 16-bit Unicode characters.  Different operating system interfaces or applications may generate strings in
       other encodings such as Shift-JIS.  The encoding command helps to bridge the gap between Unicode and these other formats.

DESCRIPTION

       Performs one of several encoding related operations, depending on option.  The legal options are:

       encoding convertfrom ?encoding? data
	      Convert data to Unicode from the specified encoding.  The characters in data are treated as binary data where the  lower	8-bits	of
	      each  character  is  taken  as a single byte.  The resulting sequence of bytes is treated as a string in the specified encoding.	If
	      encoding is not specified, the current system encoding is used.

       encoding convertto ?encoding? string
	      Convert string from Unicode to the specified encoding.  The result is a sequence of bytes  that  represents  the	converted  string.
	      Each byte is stored in the lower 8-bits of a Unicode character.  If encoding is not specified, the current system encoding is used.

       encoding dirs ?directoryList?
	      Tcl  can load encoding data files from the file system that describe additional encodings for it to work with. This command sets the |
	      search path for *.enc encoding data files to the list of directories directoryList. If directoryList is  omitted	then  the  command |
	      returns  the  current list of directories that make up the search path. It is an error for directoryList to not be a valid list. If, |
	      when a search for an encoding data file is happening, an element in directoryList does not refer to a  readable,	searchable  direc- |
	      tory, that element is ignored.

       encoding names
	      Returns a list containing the names of all of the encodings that are currently available.

       encoding system ?encoding?
	      Set the system encoding to encoding. If encoding is omitted then the command returns the current system encoding.  The system encod-
	      ing is used whenever Tcl passes strings to system calls.

EXAMPLE

       It is common practice to write script files using a text editor that produces output in the euc-jp encoding,  which  represents	the  ASCII
       characters  as  singe bytes and Japanese characters as two bytes.  This makes it easy to embed literal strings that correspond to non-ASCII
       characters by simply typing the strings in place in the script.	However, because the source command always reads files using  the  current
       system encoding, Tcl will only source such files correctly when the encoding used to write the file is the same.  This tends not to be true
       in an internationalized setting.  For example, if such a file was sourced in North America (where the ISO8859-1	is  normally  used),  each
       byte  in the file would be treated as a separate character that maps to the 00 page in Unicode.	The resulting Tcl strings will not contain
       the expected Japanese characters.  Instead, they will contain a sequence of Latin-1 characters that correspond to the bytes of the original
       string.	The encoding command can be used to convert this string to the expected Japanese Unicode characters.  For example,
	      set s [encoding convertfrom euc-jp "xA4xCF"]
       would return the Unicode string "u306F", which is the Hiragana letter HA.

SEE ALSO

       Tcl_GetEncoding(3)

KEYWORDS

       encoding

Tcl									8.1							       encoding(n)