Sponsored Content
Top Forums Shell Programming and Scripting perl sort unicode non-ascii letters Post 302316921 by jim mcnamara on Sunday 17th of May 2009 10:24:03 AM
Old 05-17-2009
You can set collation sequences by defining a locale, then calling setlocale().
Let the underlying sort code handle the problem. You define what you want once, and it is there forever.

See man localedef.
 

10 More Discussions You Might Find Interesting

1. Programming

How to display unicode characters / unicode string

I have a stream of characters like "\u8BBE\u5907\u7BA1" and i want to display it. I tried following things already without any luck. 1) printf("%s",L("\u8BBE\u5907\u7BA1")); 2) printf("%lc",0x8BBE); 3) setlocale followed by fwide followed by wprintf 4) also changed the local manually... (3 Replies)
Discussion started by: jackdorso
3 Replies

2. UNIX for Dummies Questions & Answers

Non-ascii character detection (perl or grep)

Hi, Can I know how to grep for lines with non-ascii characters in a file? If not grep, at least can we do it with command-line perl or awk? I tried the functionality of perl, but still could not get the result. Any help?? PS: I was sure that someone should have asked this question... (9 Replies)
Discussion started by: srinivasan_85
9 Replies

3. Shell Programming and Scripting

sort file with non ascii chars and cjk with perl

Hello, I am not a programmer, please be patient. Actually, I have started to look into Perl because it seems to be able to solve all the problems (or most of them) I happen meet using my computer. These problems are generally all text-manipulation-related. Although I started to study, I cannot... (6 Replies)
Discussion started by: ahsog
6 Replies

4. Shell Programming and Scripting

convert ascii values into ascii characters

Hi gurus, I have a file in unix with ascii values. I need to convert all the ascii values in the file to ascii characters. File contains nearly 20000 records with ascii values. (10 Replies)
Discussion started by: sandeeppvk
10 Replies

5. Shell Programming and Scripting

Ambiguity in unicode, Perl CGI

Hello, I was written a cgi with a textarea to save some words from web. I grab and write words like this: $cgiparams{'CONTENTS'} =~ s/\r//g; #$cgiparams{'CONTENTS'} =~ s/á/á/g; open(TM, ">$editedfilename"); #binmode(TM,... (1 Reply)
Discussion started by: Zaxon
1 Replies

6. Shell Programming and Scripting

Perl script backspace not working for Unicode characters

Hello, My Perl script reads input from stdin and prints it out to stdout. After I read input I use BACKSPACE to erase characters. However BACKSPACE does not work with Unicode characters that are multi-bytes. On screen the character is erased but underneath only one byte is deleted instead of all... (3 Replies)
Discussion started by: tdw
3 Replies

7. Shell Programming and Scripting

sort -t option causing code to fail need ASCII character

Hello, When I run this UNIX code without the -t option it gives me the desired results. The code keeps the record with the greatest datetime based on the key columns. I sort it first then sort it again with the -u option, that's it. I need to have a variable to specify an ASCII character... (2 Replies)
Discussion started by: script_op2a
2 Replies

8. Shell Programming and Scripting

Help with Unicode identification using PERL or AWK

Hello, I have a large file in UTF8 format with around 200 thousand plus strings which have a large number of scripts (code-blocks/code-pages). I need to extract from the file only the following: All strings having basic Latin characters: 0021-007E All strings in the Devanagari range: 0900 to... (3 Replies)
Discussion started by: gimley
3 Replies

9. UNIX for Advanced & Expert Users

Conversion from EBCDIC to Ascii OR unicode

I have a file in my Unix ( SOLARIS ) with EBCDIC format...I want this file to read in ASCII OR unicode...Is it possible with UNIX to convert this file on ASCII OR UNICODE format from EBCDIC format? I was searching through web and found only conversion table :( Request Rejected Below is... (16 Replies)
Discussion started by: joshilalit2004
16 Replies

10. Shell Programming and Scripting

Convert Hex to Ascii in a Ascii file

Hi All, I have an ascii file in which few columns are having hex values which i need to convert into ascii. Kindly suggest me what command can be used in unix shell scripting? Thanks in Advance (2 Replies)
Discussion started by: HemaV
2 Replies
locale(3pm)						 Perl Programmers Reference Guide					       locale(3pm)

NAME
locale - Perl pragma to use or avoid POSIX locales for built-in operations SYNOPSIS
@x = sort @y; # Unicode sorting order { use locale; @x = sort @y; # Locale-defined sorting order } @x = sort @y; # Unicode sorting order again DESCRIPTION
This pragma tells the compiler to enable (or disable) the use of POSIX locales for built-in operations (for example, LC_CTYPE for regular expressions, LC_COLLATE for string comparison, and LC_NUMERIC for number formatting). Each "use locale" or "no locale" affects statements to the end of the enclosing BLOCK. Starting in Perl 5.16, a hybrid mode for this pragma is available, use locale ':not_characters'; which enables only the portions of locales that don't affect the character set (that is, all except LC_COLLATE and LC_CTYPE). This is useful when mixing Unicode and locales, including UTF-8 locales. use locale ':not_characters'; use open ":locale"; # Convert I/O to/from Unicode use POSIX qw(locale_h); # Import the LC_ALL constant setlocale(LC_ALL, ""); # Required for the next statement # to take effect printf "%.2f ", 12345.67' # Locale-defined formatting @x = sort @y; # Unicode-defined sorting order. # (Note that you will get better # results using Unicode::Collate.) See perllocale for more detailed information on how Perl supports locales. NOTE
If your system does not support locales, then loading this module will cause the program to die with a message: "Your vendor does not support locales, you cannot use the locale module." perl v5.18.2 2013-11-04 locale(3pm)
All times are GMT -4. The time now is 01:11 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy