sort file with non ascii chars and cjk with perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sort file with non ascii chars and cjk with perl
# 1  
Old 04-06-2009
sort file with non ascii chars and cjk with perl

Hello,
I am not a programmer, please be patient.
Actually, I have started to look into Perl because it seems to be able to solve all the problems (or most of them) I happen meet using my computer. These problems are generally all text-manipulation-related.

Although I started to study, I cannot figure out (yet?) how to sort a file with special characters in it.
With the Unix "sort" utility, I don't get the expected output.
For example, I need to sort a file where some words have a leading "š", which I need to be put just after the "normal s" and not after "z".

I had a look at the relevant (for what I understand) Perl tutorials, and it seems to be quite complicated for a new comer. Are there ready to use scripts?

For what concerns cjk, I've found at CPAN a tool I need to convert traditional to simplified Chinese and vice-versa (Encode::HanConvert), but cannot find a similar tool for sorting Chinese characters (by stroke, radical, pinyin). Does it exist or do I have to learn throughly Perl to do what I need?

Any suggestion will be appreciated.
# 2  
Old 04-06-2009
Regular sort responds to a sort request buy comparing what is called a collation sequence. This is defined by locale settings.

What does this give for output? Please show it:
Code:
locale

Look at the variable named LC_COLLATE. That sets how sort sees this character comparison.
# 3  
Old 04-06-2009
I don't know Chinese att all but a google search found this:

http://germain.its.maine.edu/~hiebel...pts/cedictsort
# 4  
Old 04-06-2009
sorry, forget to mention that I work on OpenBSD, so no locale support (but you can do all multilanguage work with unicode-aware apps and tools)
# 5  
Old 04-06-2009
I previously probably searched with the wrong keys. I've found this at cpan:
Unicode::Collate
It seems to be what I need, I installed it, but sorry don't know how to use it inside a script.
# 6  
Old 04-07-2009
this didn't help?
# 7  
Old 04-07-2009
ok, actually I did read it but could not figure out how to use it (I'm really new to Perl and scripting in general). After trying many times, I managed to get my "š" right after the "s", but now they are both at the beginning of the list:
Code:
sss
ššš
aaa
aab
abc
bbc
lmn
mmn
zzz

this is how I did it:
Code:
use Unicode::Collate;
$Collator = Unicode::Collate->new(
       table => undef,
       entry => << 'ENTRIES',
0073  ; [.1137.0020.0002.0073]
0161  ; [.0000.0041.0002.030C]
ENTRIES
);
open (NAMES_FILE, "< path-to-my-file")  or  die "Failed to read file : $! ";
my @not_sorted = <NAMES_FILE>;
@sorted  = $Collator->sort(@not_sorted);
print @sorted;
close (NAMES_FILE);

Now, how do I tell Perl to leave the sorting order intact and to change only the part I need? Or should I probably say to insert my "š" just after "s"? Or do I have to make a complete sorting table to do it?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Convert Hex to Ascii in a Ascii file

Hi All, I have an ascii file in which few columns are having hex values which i need to convert into ascii. Kindly suggest me what command can be used in unix shell scripting? Thanks in Advance (2 Replies)
Discussion started by: HemaV
2 Replies

2. Shell Programming and Scripting

Remove duplicate chars and sort string [SED]

Hi, INPUT: DCBADD OUTPUT: ABCD The SED script should alphabetically sort the chars in the string and remove the duplicate chars. (5 Replies)
Discussion started by: jds93
5 Replies

3. Shell Programming and Scripting

Perl SFTP, to get, sort and process every file.

Hi All, I'm niks, and i'm a newbie here and newbie also in perl sorry, i'm just wondering how can i get the file from the other hostname using sftp? then after i get it i'm going to sort the file and process it one by one. sorry because i'm a newbie. Thanks, -niks (4 Replies)
Discussion started by: nikki1200
4 Replies

4. Shell Programming and Scripting

sort -t option causing code to fail need ASCII character

Hello, When I run this UNIX code without the -t option it gives me the desired results. The code keeps the record with the greatest datetime based on the key columns. I sort it first then sort it again with the -u option, that's it. I need to have a variable to specify an ASCII character... (2 Replies)
Discussion started by: script_op2a
2 Replies

5. Shell Programming and Scripting

Perl script to sort an Excel file

Hello! I need to sort a file that is partly in English partly in Bulgarian. The original file is an Excel file but I converted it to a tab-delimited text file. The encoding of the tab delimited file is UTF-8. To sort the text, the script should test every line of the text file to see if... (9 Replies)
Discussion started by: degoor
9 Replies

6. Shell Programming and Scripting

perl sort unicode non-ascii letters

In another thread (field separator in Perl) I nearly solved my sorting problem and I finally understood the Schwartzian transform especially thank to KevinADC. After that I've found out that the sorting was not done the way I need it. I did not notice it at first because I used all vowels as a... (6 Replies)
Discussion started by: ahsog
6 Replies

7. Shell Programming and Scripting

Perl Sort on Text File

Hi, I have a file of names and I want perl to do a sort on this file. How can I sort this list of names using perl? I'm thinking of a command like: @sorted = sort { lc($a) cmp lc($b) } @not_sorted # alphabetical sort The only thing I'm sort of unsure of is, how would I get the name in my... (6 Replies)
Discussion started by: eltinator
6 Replies

8. Shell Programming and Scripting

replace ascii chars without loosing it.

Hi, Can some one tell, how to replace ascii non printable TAB from the while to something, then later on replace it back to TAB. Basciallz we do bulk data processing, our processin treats TAB as new field , So I thought we can replace it with something and later on revert it. TIA (4 Replies)
Discussion started by: braindrain
4 Replies

9. Shell Programming and Scripting

sort a file by date using perl

Hello, do any body help me to sort a file by date using perl? thanks in advance Esham (4 Replies)
Discussion started by: esham
4 Replies

10. Shell Programming and Scripting

Sort file in perl

Hi, I have an entry file for a perl script from which I need to remove duplicate entry. For example: one:two:three one:four:five two:one:three must become : one:two:three two:one:three The duplicate entry is only the first field. I try many options of sort system command but don't... (4 Replies)
Discussion started by: annececile
4 Replies
Login or Register to Ask a Question