Sponsored Content
Top Forums Shell Programming and Scripting sort file with non ascii chars and cjk with perl Post 302304429 by ahsog on Monday 6th of April 2009 11:21:32 AM
Old 04-06-2009
sort file with non ascii chars and cjk with perl

Hello,
I am not a programmer, please be patient.
Actually, I have started to look into Perl because it seems to be able to solve all the problems (or most of them) I happen meet using my computer. These problems are generally all text-manipulation-related.

Although I started to study, I cannot figure out (yet?) how to sort a file with special characters in it.
With the Unix "sort" utility, I don't get the expected output.
For example, I need to sort a file where some words have a leading "š", which I need to be put just after the "normal s" and not after "z".

I had a look at the relevant (for what I understand) Perl tutorials, and it seems to be quite complicated for a new comer. Are there ready to use scripts?

For what concerns cjk, I've found at CPAN a tool I need to convert traditional to simplified Chinese and vice-versa (Encode::HanConvert), but cannot find a similar tool for sorting Chinese characters (by stroke, radical, pinyin). Does it exist or do I have to learn throughly Perl to do what I need?

Any suggestion will be appreciated.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sort file in perl

Hi, I have an entry file for a perl script from which I need to remove duplicate entry. For example: one:two:three one:four:five two:one:three must become : one:two:three two:one:three The duplicate entry is only the first field. I try many options of sort system command but don't... (4 Replies)
Discussion started by: annececile
4 Replies

2. Shell Programming and Scripting

sort a file by date using perl

Hello, do any body help me to sort a file by date using perl? thanks in advance Esham (4 Replies)
Discussion started by: esham
4 Replies

3. Shell Programming and Scripting

replace ascii chars without loosing it.

Hi, Can some one tell, how to replace ascii non printable TAB from the while to something, then later on replace it back to TAB. Basciallz we do bulk data processing, our processin treats TAB as new field , So I thought we can replace it with something and later on revert it. TIA (4 Replies)
Discussion started by: braindrain
4 Replies

4. Shell Programming and Scripting

Perl Sort on Text File

Hi, I have a file of names and I want perl to do a sort on this file. How can I sort this list of names using perl? I'm thinking of a command like: @sorted = sort { lc($a) cmp lc($b) } @not_sorted # alphabetical sort The only thing I'm sort of unsure of is, how would I get the name in my... (6 Replies)
Discussion started by: eltinator
6 Replies

5. Shell Programming and Scripting

perl sort unicode non-ascii letters

In another thread (field separator in Perl) I nearly solved my sorting problem and I finally understood the Schwartzian transform especially thank to KevinADC. After that I've found out that the sorting was not done the way I need it. I did not notice it at first because I used all vowels as a... (6 Replies)
Discussion started by: ahsog
6 Replies

6. Shell Programming and Scripting

Perl script to sort an Excel file

Hello! I need to sort a file that is partly in English partly in Bulgarian. The original file is an Excel file but I converted it to a tab-delimited text file. The encoding of the tab delimited file is UTF-8. To sort the text, the script should test every line of the text file to see if... (9 Replies)
Discussion started by: degoor
9 Replies

7. Shell Programming and Scripting

sort -t option causing code to fail need ASCII character

Hello, When I run this UNIX code without the -t option it gives me the desired results. The code keeps the record with the greatest datetime based on the key columns. I sort it first then sort it again with the -u option, that's it. I need to have a variable to specify an ASCII character... (2 Replies)
Discussion started by: script_op2a
2 Replies

8. Shell Programming and Scripting

Perl SFTP, to get, sort and process every file.

Hi All, I'm niks, and i'm a newbie here and newbie also in perl sorry, i'm just wondering how can i get the file from the other hostname using sftp? then after i get it i'm going to sort the file and process it one by one. sorry because i'm a newbie. Thanks, -niks (4 Replies)
Discussion started by: nikki1200
4 Replies

9. Shell Programming and Scripting

Remove duplicate chars and sort string [SED]

Hi, INPUT: DCBADD OUTPUT: ABCD The SED script should alphabetically sort the chars in the string and remove the duplicate chars. (5 Replies)
Discussion started by: jds93
5 Replies

10. Shell Programming and Scripting

Convert Hex to Ascii in a Ascii file

Hi All, I have an ascii file in which few columns are having hex values which i need to convert into ascii. Kindly suggest me what command can be used in unix shell scripting? Thanks in Advance (2 Replies)
Discussion started by: HemaV
2 Replies
sort(3pm)						 Perl Programmers Reference Guide						 sort(3pm)

NAME
sort - perl pragma to control sort() behaviour SYNOPSIS
use sort 'stable'; # guarantee stability use sort '_quicksort'; # use a quicksort algorithm use sort '_mergesort'; # use a mergesort algorithm use sort 'defaults'; # revert to default behavior no sort 'stable'; # stability not important use sort '_qsort'; # alias for quicksort my $current; BEGIN { $current = sort::current(); # identify prevailing algorithm } DESCRIPTION
With the "sort" pragma you can control the behaviour of the builtin "sort()" function. In Perl versions 5.6 and earlier the quicksort algorithm was used to implement "sort()", but in Perl 5.8 a mergesort algorithm was also made available, mainly to guarantee worst case O(N log N) behaviour: the worst case of quicksort is O(N**2). In Perl 5.8 and later, quicksort defends against quadratic behaviour by shuffling large arrays before sorting. A stable sort means that for records that compare equal, the original input ordering is preserved. Mergesort is stable, quicksort is not. Stability will matter only if elements that compare equal can be distinguished in some other way. That means that simple numerical and lexical sorts do not profit from stability, since equal elements are indistinguishable. However, with a comparison such as { substr($a, 0, 3) cmp substr($b, 0, 3) } stability might matter because elements that compare equal on the first 3 characters may be distinguished based on subsequent characters. In Perl 5.8 and later, quicksort can be stabilized, but doing so will add overhead, so it should only be done if it matters. The best algorithm depends on many things. On average, mergesort does fewer comparisons than quicksort, so it may be better when complicated comparison routines are used. Mergesort also takes advantage of pre-existing order, so it would be favored for using "sort()" to merge several sorted arrays. On the other hand, quicksort is often faster for small arrays, and on arrays of a few distinct values, repeated many times. You can force the choice of algorithm with this pragma, but this feels heavy-handed, so the subpragmas beginning with a "_" may not persist beyond Perl 5.8. The default algorithm is mergesort, which will be stable even if you do not explicitly demand it. But the stability of the default sort is a side-effect that could change in later versions. If stability is important, be sure to say so with a use sort 'stable'; The "no sort" pragma doesn't forbid what follows, it just leaves the choice open. Thus, after no sort qw(_mergesort stable); a mergesort, which happens to be stable, will be employed anyway. Note that no sort "_quicksort"; no sort "_mergesort"; have exactly the same effect, leaving the choice of sort algorithm open. CAVEATS
As of Perl 5.10, this pragma is lexically scoped and takes effect at compile time. In earlier versions its effect was global and took effect at run-time; the documentation suggested using "eval()" to change the behaviour: { eval 'use sort qw(defaults _quicksort)'; # force quicksort eval 'no sort "stable"'; # stability not wanted print sort::current . " "; @a = sort @b; eval 'use sort "defaults"'; # clean up, for others } { eval 'use sort qw(defaults stable)'; # force stability print sort::current . " "; @c = sort @d; eval 'use sort "defaults"'; # clean up, for others } Such code no longer has the desired effect, for two reasons. Firstly, the use of "eval()" means that the sorting algorithm is not changed until runtime, by which time it's too late to have any effect. Secondly, "sort::current" is also called at run-time, when in fact the compile-time value of "sort::current" is the one that matters. So now this code would be written: { use sort qw(defaults _quicksort); # force quicksort no sort "stable"; # stability not wanted my $current; BEGIN { $current = sort::current; } print "$current "; @a = sort @b; # Pragmas go out of scope at the end of the block } { use sort qw(defaults stable); # force stability my $current; BEGIN { $current = sort::current; } print "$current "; @c = sort @d; } perl v5.18.2 2013-11-04 sort(3pm)
All times are GMT -4. The time now is 04:12 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy