Hello gurus, I would like to get deep into charset and encoding isse, also tried google it but no luck. Please see bellow
My configuration
I have file1, containing text. This text I am able to see correctly only on M$ windows, If i just open the file with less, cat or vi I get this:
Under linux I have to use iconv to see it correctly
I understand that this is because of that file was coded in one format (WINDOWS-1250) and encoded in another (UTF-8). But can you clarify the following?
1.) When I check the decimal ASCII value of each character I get following lines. So what does negative values mean and what is that code 341 (instead of á) ? AFAIK ASCII is from 0-127.
2.) My assumption is that if UTF-8 and WINDOWS-1250 uses for same characters different "numbers" (code representation) then if some character will be encoded using encoding1 (WINDOWS-1250) it gains approporiate "code1" from encoding1 table. So if this encoded character (or more likely it's number representation, which is "code1") will be decoded using another encoding (UTF-8) the only thing that happens here is that for "code1" there will be lookup in encoding2 (UTF-8) table and approporiate character from encoding2 table is asigned, am I right ? I think after some exaple it will be clear:
Please look at following sites, they shows what will happend if you encode with one encoding and decode with another. Seems that until you reach 127 (decimal) boundary no mather if you decode with wrong decoding (this is why some characters in above example was displayed correctly even when wrong encoding was used).
3.) If i understand it right there is no way to tell how file was encoded (unless there is some header that specify this, or you do some statistical language analysis etc.). So why/how "file" commands recognize UTF-8 encoding but not WINDOWS-1250 ?
Thank you very much
Hi, I'm using putty and when I try to write ü it writes | (or when I try to write é , it writes i)
I tried to change settings/translation of putty but with no success
I have KSH
# locale
LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"... (3 Replies)
Hi All,
I want to do URL encoding using shell script in my project. I decided that the sed is the correct tool to do this. But I am unable achieve what I wanted using sed. kindly help me to get rid of this.
My requirement is , there will be one URL with all special character, spaces etc...
... (8 Replies)
hi folks ,
I have a shell script which contain SQL query that dump some data from the DB in arabic and this data is written to a file in unix machine but the problem that the arabic data is appear like ??????????|111|???????? even when I move it to my windows XP machine.
Any one have an Idea... (2 Replies)
Hello All
I have a set of files, each one containing some lines that follows that regex:
regex='disabled\,.*\,\".*\"'and here is what file says about each files:
file <random file>
<random file> ASCII text, with CRLF line terminatorsSo, as an example, here is what a file ("Daffy Duck - The... (3 Replies)
Hi,
I am beginner to Unix.
My requirement is to validate the encoding used in the incoming file(csv,txt).If it is encoded with UTF-8 format,then the file should remain as such otherwise i need to chnage the encoding to UTF-8.
Please advice me how to proceed on this. (7 Replies)
Hi All,
Hope you can help me with the below :).
I'm working on a script on SUN solaris and I'm facing a problem with the number encoding as shown below,
1 is encoded to 31 (this is ASCII so it's ok)
11 is encoded as B118 !!! don't know why
111 is encoded as B1580C !!! don't know why ... (4 Replies)
I have scheduled couple of shell scripts to run using 'at' command.
The o/p of at -l is:
$ at -l
1320904800.a Thu Nov 10 01:00:00 2011
1320894000.a Wed Nov 9 22:00:00 2011
1320876000.a Wed Nov 9 17:00:00 2011
$ uname -a
SunOS dc2prcrptetl2 5.9 Generic_122300-54 sun4u sparc... (2 Replies)
Hi all!!
I´m using command file -i myfile.xml to validate XML file encoding, but it is just saying regular file . I´m expecting / looking an output as UTF8 or ANSI / ASCII
Is there command to display the files encoding?
Thank you! (2 Replies)