I am working on a web-concordance of Old Avestan and my concordance has produced a HTML file [attached in zipped format]
The sort deployed by the HTML file is not something which we normally use. I have tried my best to force a sort within the concordance itself, but the sort order does not work.
I am giving below the sort order in UTF-8 format:
Is there a Perl script which could do the trick. The data is part of an open-source project on Old Avestan and will be put up for use by all scholars working in the field.
Many thanks in advance for your help
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
I have often used msort, found in many repositories. I don't know if it would be useful for your problem, but it has a number of features beyond GNU/*nix sort: MSORT
Many thanks. I tried Msort but the problem is that it is a HTML file and the sort does not work out accurately.
I hope someone has an answer to the problem
a ready-to-use solution for your problem is probably not existing, since it is an individual html-file, you want to have sorted.
Writing a custom sort function is not that complex. To say it short,
you have to write a little function, which returns -1,0 or 1 if of to
given values $a and $b the value of $a is less, equal or greater than
$b. Within that function you may have a hash which is storing the sort weight for each character. Something like this:
As additional difficulty here, you have to handle multibyte characters of unicode. Perl should have all the necessary tools integrated to do this. But this is beyond my experience.
My Perl-Skills are rarely used. Be sure to test my code before using.
Here is my try with plain old 8-bit characters. You need only to add the unicode thingy :-)
Code:
#!/usr/bin/env perl
sub char_sort {
my %chars = (
"a" => 1,
"b" => 2,
"c" => 3,
"d" => 4,
"e" => 5,
"f" => 6,
"g" => 7,
"h" => 8,
"i" => 9,
"j" => 10,
"k" => 11,
"l" => 12,
"m" => 13,
"n" => 14,
"o" => 15,
"p" => 16,
"q" => 17,
"r" => 18,
"s" => 19,
"t" => 20,
"u" => 21,
"v" => 22,
"w" => 23,
"x" => 24,
"y" => 25,
"z" => 26);
# perl sets $a and $b for the values to compare.
# This function itself uses itself and calls with two parameters
# select which type of call and wich arguments to use
$word_a = (length($_[0])!=0)?$_[0]:$a;
$word_b = (length($_[1])!=0)?$_[1]:$b;
# Get the first chars, which we need to compare
$a1=substr($word_a,0,1);
$b1=substr($word_b,0,1);
# print("A1=$a1 B1=$b1 A=$word_a B=$word_b\n");
# if both args are empty return with equality(0)
return 0 if(length($word_a)==0 and length($word_b)==0);
# if current char is equal, call this function with the substrings beginning at the second char
return char_sort(substr($word_a,1),substr($word_b,1)) if (($chars{$a1} <=> $chars{$b1})==0);
# if current char is different, we're finished now
return $chars{$a1} <=> $chars{$b1};
}
@list = ("my","favorite","animal","book","for","advanced","biologists");
@sorted = sort char_sort @list;
print("\n");
print("*** Unsorted ***\n");
foreach(@list) {
print;
print("\n");
}
print("\n");
print("*** Sorted ***\n");
foreach(@sorted) {
print;
print("\n");
}
print("\n");
Output from zipdiff GNU EAR comparison tool produces output in html divided into three sections "Added, Removed, Changed". I want the output to be sorted by jar or war file.
<html>
<body>
<table>
<tr>
<td class="diffs" colspan="2">Added </td>
</tr>
<tr><td>
<ul>... (5 Replies)
Hello all,
I have a list of file names in a text document where each file name consists of 4 letters and 3 numbers (for example MACR119). There are 48 file names in the document (they are not in alphabetical or numerical order). I would like to reorder the list of names so that the 48th name is... (3 Replies)
Hi Friends,
I have a HTMl file with 10 columns.
I found a script online that can sort any single column in a HTML file.
But, I would like to sort on multiple columns at once.
Could you please show some pointers?
Thanks (6 Replies)
Hi,
I am trying to sort the following file in descending order of its fourth column.
2 1 363828 -2.423225e-03
3 1 363828 4.132763e-03
3 2 363828 8.150133e-03
4 1 363828 4.126890e-03
I use
sort -k4,4g -r input.txt > output.txt ... (1 Reply)
Hello,
I have a large database of words and would like them sorted in reverse order i.e. from the end up.
An example will make this clear:
I have tried to write a program in Perl which basically takes the string from the end and tries to sort from that end but it does not seem... (5 Replies)
Hi everyone. I have an html file with lines like so:
link href="localFolder/...">
link href="htp://...">
img src="localFolder/...">
img src="htp://...">
I want to remove the links with http in the href and imgs with http in its src. I'm having trouble removing them because there... (4 Replies)
I a file with log entries... I want to sort it so that the last line in the file is first and the first line is last..
eg.
Sample file
1
h
a
f
8
6
After sort should look like
6
8
f
a
h
1 (11 Replies)
I would like to arrange /sort filenames ending with suffix like ".00XXXX". where X is a digit. However the order of arrangement is in a text file and is 'harpharzard'. e.g the text file may be like
002345
009807
001145
I wanted to avoid doing this using sql and exporting the text file back to... (4 Replies)