12-12-2019
That dash definitely triggers the bug and I cannot print the duckduckgo.com homepage to PDF unless I delete the dash (replacing it with a hyphen works fine).
So I guess my problem is worse than I thought: Not just Unicode characters but even extended ASCII characters in filenames can trigger this bug.
10 More Discussions You Might Find Interesting
1. Programming
I have a stream of characters like "\u8BBE\u5907\u7BA1"
and i want to display it.
I tried following things already without any luck.
1) printf("%s",L("\u8BBE\u5907\u7BA1"));
2) printf("%lc",0x8BBE);
3) setlocale followed by fwide followed by wprintf
4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies
2. UNIX for Advanced & Expert Users
Hi all,
I am generating a file on the Unix machine , now i want to FTP the same file to the NT machine.
how can i do that and the application currently upon which i am working is a JAVA based application.
I need your help.
regards
Ruchir (2 Replies)
Discussion started by: Ruchir
2 Replies
3. Shell Programming and Scripting
Noob question ..
My Java based application needs to change some user passwords based on some user actions. Since this application can run on Redhat AS2.1 / AS4.0 / Solaris 9 etc, the most safe and portable solution that I could think of was: Use expect.
Now, expect is not available on all... (1 Reply)
Discussion started by: namityadav
1 Replies
4. UNIX for Advanced & Expert Users
Here at the agency I work for, a need has arisen for a subdomain that utilizes some unicode characters. It has something to do with our foreign clients getting "page could not be displayed" errors in their internationalized browsers. I am still investigating the issue, but I've been asked to find... (2 Replies)
Discussion started by: deckard
2 Replies
5. Shell Programming and Scripting
Hello all !
I'm trying to write a shell script (bash) to ftp a file starting with particular name like "Latest_" that is present on a Windows box to UNIX server. Basically I want to set this script in the cron so that daily the new build that is posted on the Windows box can be downloaded to the... (2 Replies)
Discussion started by: vijayb4u83
2 Replies
6. UNIX for Advanced & Expert Users
Hello, I have a question. There is a command line mail client "mail", it is good, but obviously, does not support Unicode. Are there any (other) mail clients for command line having support for Unicode (UTF-8) and maybe other encodings? Or are there any other versions of mail/mailx programm which... (0 Replies)
Discussion started by: Action
0 Replies
7. Programming
hi everybody,
currently i'm playing with perl and Gtk2.
i've found a fairly old but nice looking example of a client/server application which is written in perl and Gtk2.
the server part works perfect but i can't start the client part and keep getting following error message:
$ ./client-gui.pl... (1 Reply)
Discussion started by: pseudocoder
1 Replies
8. Shell Programming and Scripting
I don't want HTML_CONTENT,RICH_CONTENT,TEXT_CONTENT columns data in the file and reset of data we need to extract.
Find the attached file.
Need to extract date in between DI_UX_ROW_END tag.
Can help me using unix command using AWK.
Thanks, (2 Replies)
Discussion started by: bmk
2 Replies
9. Shell Programming and Scripting
WE have a file coming from a server that has characters for 4-5 languages. If I download the file to my windows PC and open in Notepad ++, I can clearly see the text in different languages. Notepad++ is able to reder text that is in Portugese, French, Thai etc. My objective it to do the following:... (2 Replies)
Discussion started by: vskr72
2 Replies
10. How to Post in the The UNIX and Linux Forums
Hello my dear friends,
Two file are auto generated from mon - fri at different directories on same windows box.Every day i have to copy the file, rename it (specific name)and ftp it to linux box specified directory.
is it possible to automate this process,If yes this has to be done from windows... (1 Reply)
Discussion started by: umesh yadav
1 Replies
UTF(6) Games Manual UTF(6)
NAME
UTF, Unicode, ASCII, rune - character set and format
DESCRIPTION
The Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character
Set Transformation Format, 8 bits wide). The Unicode Standard represents its characters in 16 bits; UTF-8 represents such values in an
8-bit byte stream. Throughout this manual, UTF-8 is shortened to UTF.
In Plan 9, a rune is a 16-bit quantity representing a Unicode character. Internally, programs may store characters as runes. However, any
external manifestation of textual information, in files or at the interface between programs, uses a machine-independent, byte-stream
encoding called UTF.
UTF is designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding. Runes with values above
7F appear as sequences of two or more bytes with values only from 80 to FF.
The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even if not
written to deal with UTF, as do programs that deal with uninterpreted byte streams. However, programs that perform semantic processing on
ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input. See rune(2).
Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows:
01. x in [00000000.0bbbbbbb] -> 0bbbbbbb
10. x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
11. x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way. Conversions 10 and 11 represent higher-
valued characters as sequences of two or three bytes with the high bit set. Plan 9 does not support the 4, 5, and 6 byte sequences pro-
posed by X-Open. When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used.
In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080.
FILES
/lib/unicode
table of characters and descriptions, suitable for look(1).
SEE ALSO
ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.
UTF(6)