This is tricky (sort of), because your program looks sane but it does not act that way. For more consistent behaviour, make sure both files are saved in some sort of Unicode, preferably UTF-8.
The following may not make sense if you do not have Perl 5.8 or later. Perl did not have really good Unicode support prior to 5.8.
Then fix your script to be correctly parsed as UTF-8. This is important because your script (not data file!) contains non-ASCII characters. If you followed my advice, your script file will have the special characters encoded in UTF-8. But Perl will not automatically parse it as UTF-8. It always treats it as ASCII unless you instruct it otherwise.
Finally, make sure the data file is interpreted as UTF-8, and the results being output in UTF-8.
Then I got your expected result on my Windows machine.
I have a stream of characters like "\u8BBE\u5907\u7BA1"
and i want to display it.
I tried following things already without any luck.
1) printf("%s",L("\u8BBE\u5907\u7BA1"));
2) printf("%lc",0x8BBE);
3) setlocale followed by fwide followed by wprintf
4) also changed the local manually... (3 Replies)
Hi,
Can I know how to grep for lines with non-ascii characters in a file?
If not grep, at least can we do it with command-line perl or awk? I tried the functionality of perl, but still could not get the result. Any help??
PS: I was sure that someone should have asked this question... (9 Replies)
Hello,
I am not a programmer, please be patient.
Actually, I have started to look into Perl because it seems to be able to solve all the problems (or most of them) I happen meet using my computer. These problems are generally all text-manipulation-related.
Although I started to study, I cannot... (6 Replies)
Hi gurus,
I have a file in unix with ascii values. I need to convert all the ascii values in the file to ascii characters. File contains nearly 20000 records with ascii values. (10 Replies)
Hello,
I was written a cgi with a textarea to save some words from web.
I grab and write words like this:
$cgiparams{'CONTENTS'} =~ s/\r//g;
#$cgiparams{'CONTENTS'} =~ s/á/á/g;
open(TM, ">$editedfilename");
#binmode(TM,... (1 Reply)
Hello,
My Perl script reads input from stdin and prints it out to stdout. After I read input I use BACKSPACE to erase characters. However BACKSPACE does not work with Unicode characters that are multi-bytes. On screen the character is erased but underneath only one byte is deleted instead of all... (3 Replies)
Hello,
When I run this UNIX code without the -t option it gives me the desired results.
The code keeps the record with the greatest datetime based on the key columns.
I sort it first then sort it again with the -u option, that's it.
I need to have a variable to specify an ASCII character... (2 Replies)
Hello,
I have a large file in UTF8 format with around 200 thousand plus strings which have a large number of scripts (code-blocks/code-pages).
I need to extract from the file only the following:
All strings having basic Latin characters: 0021-007E
All strings in the Devanagari range: 0900 to... (3 Replies)
I have a file in my Unix ( SOLARIS ) with EBCDIC format...I want this file to read in ASCII OR unicode...Is it possible with UNIX to convert this file on ASCII OR UNICODE format from EBCDIC format?
I was searching through web and found only conversion table :(
Request Rejected
Below is... (16 Replies)
Hi All,
I have an ascii file in which few columns are having hex values which i need to convert into ascii. Kindly suggest me what command can be used in unix shell scripting?
Thanks in Advance (2 Replies)
Discussion started by: HemaV
2 Replies
LEARN ABOUT MOJAVE
locale
locale(3pm) Perl Programmers Reference Guide locale(3pm)NAME
locale - Perl pragma to use or avoid POSIX locales for built-in operations
SYNOPSIS
@x = sort @y; # Unicode sorting order
{
use locale;
@x = sort @y; # Locale-defined sorting order
}
@x = sort @y; # Unicode sorting order again
DESCRIPTION
This pragma tells the compiler to enable (or disable) the use of POSIX locales for built-in operations (for example, LC_CTYPE for regular
expressions, LC_COLLATE for string comparison, and LC_NUMERIC for number formatting). Each "use locale" or "no locale" affects statements
to the end of the enclosing BLOCK.
Starting in Perl 5.16, a hybrid mode for this pragma is available,
use locale ':not_characters';
which enables only the portions of locales that don't affect the character set (that is, all except LC_COLLATE and LC_CTYPE). This is
useful when mixing Unicode and locales, including UTF-8 locales.
use locale ':not_characters';
use open ":locale"; # Convert I/O to/from Unicode
use POSIX qw(locale_h); # Import the LC_ALL constant
setlocale(LC_ALL, ""); # Required for the next statement
# to take effect
printf "%.2f
", 12345.67' # Locale-defined formatting
@x = sort @y; # Unicode-defined sorting order.
# (Note that you will get better
# results using Unicode::Collate.)
See perllocale for more detailed information on how Perl supports locales.
NOTE
If your system does not support locales, then loading this module will cause the program to die with a message:
"Your vendor does not support locales, you cannot use the locale
module."
perl v5.18.2 2013-11-04 locale(3pm)