Sorting a html file with an external sort order


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sorting a html file with an external sort order
# 1  
Old 07-24-2016
Sorting a html file with an external sort order

I am working on a web-concordance of Old Avestan and my concordance has produced a HTML file [attached in zipped format]
The sort deployed by the HTML file is not something which we normally use. I have tried my best to force a sort within the concordance itself, but the sort order does not work.
I am giving below the sort order in UTF-8 format:
Code:
a,ā,å,ā̊,ą,ą̇,b,β,c,d,δ,e,ē,ə,ə̄,f,g,ġ,γ,h,i,ī,j,k,l,m,m̨,n,ń,ṇ,ŋ,ŋ́,ŋͮ,o,ō,p,r,s,š,š́,ṣ̌,t,t̰,ϑ,u,ū,v,x,x́,xᵛ,y,ẏ,z,ž

Is there a Perl script which could do the trick. The data is part of an open-source project on Old Avestan and will be put up for use by all scholars working in the field.
Many thanks in advance for your help
# 2  
Old 07-24-2016
Hi.

I have often used msort, found in many repositories. I don't know if it would be useful for your problem, but it has a number of features beyond GNU/*nix sort: MSORT

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
# 3  
Old 07-24-2016
Many thanks. I tried Msort but the problem is that it is a HTML file and the sort does not work out accurately.
I hope someone has an answer to the problem
# 4  
Old 07-25-2016
Hi,

a ready-to-use solution for your problem is probably not existing, since it is an individual html-file, you want to have sorted.

Writing a custom sort function is not that complex. To say it short,
you have to write a little function, which returns -1,0 or 1 if of to
given values $a and $b the value of $a is less, equal or greater than
$b. Within that function you may have a hash which is storing the sort weight for each character. Something like this:

Code:
sub old_avesian {

   %chars = (
    
         "a" => 1,
         "ā" => 2,
         "å" => 3,
         "ā̊" => 4,
         "ą" => 5,
         "ą̇" => 6,
         ...
  )
  return $chars{"$a"} cmp $chars{"$b"};
}

@sorted_list = sort old_avesian @wordlist;

As additional difficulty here, you have to handle multibyte characters of unicode. Perl should have all the necessary tools integrated to do this. But this is beyond my experience.

My Perl-Skills are rarely used. Be sure to test my code before using.
This User Gave Thanks to stomp For This Post:
# 5  
Old 07-25-2016
I will definitely try it out and see the result and get back to you.
Many thanks for your kind help
# 6  
Old 07-25-2016
Btw: I tried this myself at the past weekend, and ran into the UTF-8 multibyte problem.
# 7  
Old 07-26-2016
Hi gimley,

a good point start reading is for sure this one:

perlunicode - perldoc.perl.org

Here is my try with plain old 8-bit characters. You need only to add the unicode thingy :-)

Code:
#!/usr/bin/env perl

sub char_sort {
        my %chars = (
                "a" => 1,
                "b" => 2,
                "c" => 3,
                "d" => 4,
                "e" => 5,
                "f" => 6,
                "g" => 7,
                "h" => 8,
                "i" => 9,
                "j" => 10,
                "k" => 11,
                "l" => 12,
                "m" => 13,
                "n" => 14,
                "o" => 15,
                "p" => 16,
                "q" => 17,
                "r" => 18,
                "s" => 19,
                "t" => 20,
                "u" => 21,
                "v" => 22,
                "w" => 23,
                "x" => 24,
                "y" => 25,
                "z" => 26);

        # perl sets $a and $b for the values to compare. 
        # This function itself uses itself and calls with two parameters
        # select which type of call and wich arguments to use
        $word_a = (length($_[0])!=0)?$_[0]:$a;
        $word_b = (length($_[1])!=0)?$_[1]:$b;

        # Get the first chars, which we need to compare
        $a1=substr($word_a,0,1);
        $b1=substr($word_b,0,1);

        # print("A1=$a1 B1=$b1 A=$word_a B=$word_b\n");

        # if both args are empty return with equality(0)
        return 0 if(length($word_a)==0 and length($word_b)==0);

        # if current char is equal, call this function with the substrings beginning at the second char
        return char_sort(substr($word_a,1),substr($word_b,1)) if (($chars{$a1} <=> $chars{$b1})==0);

        # if current char is different, we're finished now
        return $chars{$a1} <=> $chars{$b1};
}

@list = ("my","favorite","animal","book","for","advanced","biologists");
@sorted = sort char_sort @list;

print("\n");
print("*** Unsorted ***\n");
foreach(@list) {
        print;
        print("\n");
 }
print("\n");

print("*** Sorted ***\n");
foreach(@sorted) {
        print;
        print("\n");
 }
print("\n");


Last edited by stomp; 07-26-2016 at 06:55 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sort html based on .jar, .war file names and still keep text within three groups.

Output from zipdiff GNU EAR comparison tool produces output in html divided into three sections "Added, Removed, Changed". I want the output to be sorted by jar or war file. <html> <body> <table> <tr> <td class="diffs" colspan="2">Added </td> </tr> <tr><td> <ul>... (5 Replies)
Discussion started by: kchinnam
5 Replies

2. UNIX for Dummies Questions & Answers

[Solved] Reverse the order of a list of file names (but not sort them alphabetically or numerically)

Hello all, I have a list of file names in a text document where each file name consists of 4 letters and 3 numbers (for example MACR119). There are 48 file names in the document (they are not in alphabetical or numerical order). I would like to reorder the list of names so that the 48th name is... (3 Replies)
Discussion started by: MDeBiasse
3 Replies

3. Web Development

Sort 3 or more columns in a HTML file

Hi Friends, I have a HTMl file with 10 columns. I found a script online that can sort any single column in a HTML file. But, I would like to sort on multiple columns at once. Could you please show some pointers? Thanks (6 Replies)
Discussion started by: jacobs.smith
6 Replies

4. UNIX for Dummies Questions & Answers

Sorting a file in descending order when you have 10e- values

Hi, I am trying to sort the following file in descending order of its fourth column. 2 1 363828 -2.423225e-03 3 1 363828 4.132763e-03 3 2 363828 8.150133e-03 4 1 363828 4.126890e-03 I use sort -k4,4g -r input.txt > output.txt ... (1 Reply)
Discussion started by: evelibertine
1 Replies

5. Shell Programming and Scripting

Sorting strings in reverse order

Hello, I have a large database of words and would like them sorted in reverse order i.e. from the end up. An example will make this clear: I have tried to write a program in Perl which basically takes the string from the end and tries to sort from that end but it does not seem... (5 Replies)
Discussion started by: gimley
5 Replies

6. Shell Programming and Scripting

multiple sorting with different order

Hi Guys, I have data like this HOS05 23/12/2008 10AM HOS06 15/12/2008 2PM HOS62 29/12/2008 10AM HOS64 23/12/2008 2PM HOS70 26/12/2008 10AM ZFT01 06/12/2008 10AM HOS73 11/12/2008 2PM MHOS0 05/12/2008 10AM MHOS0 20/12/2008 2PM MHOS0 27/12/2010 2PM MHOS0 11/12/2008 10AM MHOS0 30/12/2009... (1 Reply)
Discussion started by: ckarunprakash
1 Replies

7. Shell Programming and Scripting

Remove external urls from .html file

Hi everyone. I have an html file with lines like so: link href="localFolder/..."> link href="htp://..."> img src="localFolder/..."> img src="htp://..."> I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
Discussion started by: CowCow339
4 Replies

8. UNIX for Dummies Questions & Answers

Order sorting

How do you sort text in order using sed? :confused: For example 01 B D A C to 01 ABCD (3 Replies)
Discussion started by: evoGage
3 Replies

9. Shell Programming and Scripting

sort a file in reverse order

I a file with log entries... I want to sort it so that the last line in the file is first and the first line is last.. eg. Sample file 1 h a f 8 6 After sort should look like 6 8 f a h 1 (11 Replies)
Discussion started by: frustrated1
11 Replies

10. UNIX for Advanced & Expert Users

Sorting filenames by order in another file

I would like to arrange /sort filenames ending with suffix like ".00XXXX". where X is a digit. However the order of arrangement is in a text file and is 'harpharzard'. e.g the text file may be like 002345 009807 001145 I wanted to avoid doing this using sql and exporting the text file back to... (4 Replies)
Discussion started by: samudimu
4 Replies
Login or Register to Ask a Question