Sort roman numerals


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sort roman numerals
# 1  
Old 03-25-2011
Sort roman numerals

If I use ls to print all the files of a folder, is there a way to sort using roman numerals?

I am thinking about a result like:
benjamin_I.wmv
benjamin_II.wmv
benjamin_II.wmv
benjamin_III.wmv
benjamin_IV.wmv
benjamin_V.wmv
benjamin_VI.wmv
benjamin_VII.wmv
benjamin_VIII.wmv
benjamin_IX.wmv

The roman numerals are always preceded by an underscore.
# 2  
Old 03-25-2011
Your best bet is probably perl's 'Roman' module.
Download Roman.pm from CPAN here: http://search.cpan.org/~chorny/Roman-1.23/lib/Roman.pm
copy it into /usr/lib/perl5/site_perl/5.8.8/ (adjust to your perl version).

Create a perl script sortRoman.pl:

Code:
#!/usr/bin/perl -w

use Roman; 

sub romanSort {  #custom sorting definition
    $a =~ /^.*_([MDCLXVI]+)\..*/;  #capture all roman numerals after underscore
    $aRom = arabic($1);   #convert captured roman number to arabic (e.g. XIV --> 14 )
    $b =~ /^.*_([MDCLXVI]+)\..*/;   #repeat with second input

    $aRom <=> arabic($1);  #numeric comparison between the converted numbers
}

@data = (<>);  #slurp the whole input into one array

print sort romanSort @data;  #print sorted array using custom routine romanSort

make it executable
Code:
chmod 754 sortRoman.pl

and try it out:
Code:
$ cat testdata
b_II.wmv
b_III.wmv
b_IX.wmv
b_IV.wmv
b_VI.wmv
b_V.wmv
b_VII.wmv
b_CXLIV.wmv
b_CXIV.wmv
b_CXII.wmv
A_B_XIV.wmv

$ ./sortRoman.pl testdata
b_II.wmv
b_III.wmv
b_IV.wmv
b_V.wmv
b_VI.wmv
b_VII.wmv
b_IX.wmv
A_B_XIV.wmv
b_CXII.wmv
b_CXIV.wmv
b_CXLIV.wmv


Last edited by mirni; 03-28-2011 at 08:07 AM..
These 2 Users Gave Thanks to mirni For This Post:
# 3  
Old 03-25-2011
Egads... I'd translate the numbers from roman numerals into normal numbers, sort, then change them back.
Code:
# roman.awk
# Adapted from a clever converter found here
# http://scripts.mit.edu/~yfarjoun/homepage/index.php?title=Code_Snippets

BEGIN	{
		R["I"]=1;	R["V"]=5;	R["X"]=10;	R["L"]=50;
		R["C"]=100;	R["D"]=500;	R["M"]=1000;

		E["iv"]="IIII";		E["ix"]="VIIII";
		E["xl"]="XXXX";		E["xc"]="LXXXX";
		E["cd"]="CCCC";		E["cm"]="DCCCC";
		E["iix"]="VIII";	E["xxc"]="LXXX";
		E["ccm"]="DCCC";	E["vl"]="XXXXV";
		E["ld"]="CCCCL";
	}

	function roman_arabic(RN)
	{
		SUM=0;
		RN=tolower(RN);

		# Substitute roman numeral forms into things we can count.
		# Substitue lower case for upper case so substitutions
		# don't happen twice by accident.
		for(K in E)	while(sub(K, E[K], RN));

		# Convert anything that didn't get substituted to uppercase.
		RN=toupper(RN);

		for(K in R) while(sub(K, "", RN)) SUM+=R[K];

		return(SUM);
	}

	{
		split($0, ARR, /[_.]/);
		if($0 ~ /_[iIvVlLxXcCdDmM]+/)
		{
#			print ARR[1], ARR[2], ARR[3];
			printf("%s<!--%d-->_%s.%s\n", ARR[1], roman_arabic(ARR[2]), ARR[2], ARR[3]);
		}
	}

Code:
$ awk -f roman.awk < list
benjamin<!--1-->_I.wmv
benjamin<!--8-->_VIII.wmv
benjamin<!--4-->_IV.wmv
benjamin<!--5-->_V.wmv
benjamin<!--2-->_II.wmv
benjamin<!--2-->_II.wmv
benjamin<!--9-->_IX.wmv
benjamin<!--7-->_VII.wmv
benjamin<!--3-->_III.wmv
benjamin<!--6-->_VI.wmv
$ awk -f roman.awk < list | sort
benjamin<!--1-->_I.wmv
benjamin<!--2-->_II.wmv
benjamin<!--2-->_II.wmv
benjamin<!--3-->_III.wmv
benjamin<!--4-->_IV.wmv
benjamin<!--5-->_V.wmv
benjamin<!--6-->_VI.wmv
benjamin<!--7-->_VII.wmv
benjamin<!--8-->_VIII.wmv
benjamin<!--9-->_IX.wmv
$ awk -f roman.awk < list2 | sort | sed -r 's#<!--.*-->##g'
benjamin_I.wmv
benjamin_II.wmv
benjamin_II.wmv
benjamin_III.wmv
benjamin_IV.wmv
benjamin_V.wmv
benjamin_VI.wmv
benjamin_VII.wmv
benjamin_VIII.wmv
benjamin_IX.wmv
$

...but beware of all the valid words that can be made from roman numerals:
Code:
$ egrep -i "^[IVXLCRDM]{3}[IVXLCRDM]*$" /usr/share/dict/cracklib-small
cdc
cdr
civic
civil
did
dill
dim
drill
icc
iii
ill
lid
lim
livid
mid
mild
mill
mimi
mimic
mix
rid
rill
rim
vii
viii
vivid
$

These 3 Users Gave Thanks to Corona688 For This Post:
# 4  
Old 03-28-2011
Quote:
Originally Posted by mirni
Your best bet is probably perl's 'Roman' module.
Download Roman.pm from CPAN here: http://search.cpan.org/~chorny/Roman-1.23/lib/Roman.pm
copy it into /usr/lib/perl5/site_perl/5.8.8/ (adjust to your perl version).

Create a perl script sortRoman.pl:

Code:
#!/usr/bin/perl -w
 
use Roman; 
 
sub romanSort { 
    $a =~ /^.*_([MDCLXVI]+)\..*/; 
    $aRom = arabic($1);
    $b =~ /^.*_([MDCLXVI]+)\..*/; 
 
    $aRom <=> arabic($1); 
}
 
@data = (<>);
 
print sort romanSort @data;

make it executable
Code:
chmod 754 sortRoman.pl

and try it out:
Code:
$ cat testdata
b_II.wmv
b_III.wmv
b_IX.wmv
b_IV.wmv
b_VI.wmv
b_V.wmv
b_VII.wmv
b_CXLIV.wmv
b_CXIV.wmv
b_CXII.wmv
A_B_XIV.wmv
 
$ ./sortRoman.pl testdata
b_II.wmv
b_III.wmv
b_IV.wmv
b_V.wmv
b_VI.wmv
b_VII.wmv
b_IX.wmv
A_B_XIV.wmv
b_CXII.wmv
b_CXIV.wmv
b_CXLIV.wmv



What does this arabic($1) stand for ????
# 5  
Old 03-28-2011
@centurion_13: arabic() is a function defined in Roman.pm module, to convert roman number to arabic (e.g.
Code:
$a = "XLVII";
$b = arabic($a);  # b==47

That's what that Roman.pm module is made for.

$1 references what was captured in a most recent regex with parentheses.

I added comments to my original reply.
This User Gave Thanks to mirni For This Post:
# 6  
Old 03-28-2011
Hi.

The utility msort is in the Debian GNU/Linux repositories. It often makes life easy for complex sorting situations:
Code:
#!/usr/bin/env bash

# @(#) s2	Demonstrate sort of roman numerals, msort.
# http://freshmeat.net/projects/msort

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
pe() { for i;do printf "%s" "$i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for i;do printf "%s" "$i";done; printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && . $C ;msort
msort --version | head -1

FILE=${1-data3}
pl " Data file $FILE:"
cat $FILE

pl " Results, msort on roman numerals, field 2:"
msort --quiet --line --position 2 --comparison-type numeric --number-system roman $FILE

exit 0

producing:
Code:
% ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version"ker|rel
er|rel, machine: Linux, 2.6.32-5-686, i686
Distribution        : Debian GNU/Linux 6.0 
GNU bash 4.1.5
msort 8.53

-----
 Data file data3:
file ii suffix
file i suffix
file iv suffix
file iii suffix
file c suffix

-----
 Results, msort on roman numerals, field 2:
file i suffix
file ii suffix
file iii suffix
file iv suffix
file c suffix

If you do not use Debian, see the msort home page as noted in the script.

Best wishes ... cheers, drl

PS. I had trouble with older versions of msort on 64-bit Debian (lenny), but the version of msort on the current stable edition (squeeze) seems to work correctly, as noted above.
These 2 Users Gave Thanks to drl For This Post:
# 7  
Old 03-28-2011
Quote:
Originally Posted by mirni
@centurion_13: arabic() is a function defined in Roman.pm module, to convert roman number to arabic
...and in case it's unclear, "arabic" numerals are normal numbers, digits 0 through 9 etc.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[FUN] Numbers to Roman letters/num

Heyas Just a little fun script (code block) i'd like to share for fun. #/bin/bash # roman.sh # # Function # num2roman() { # NUM # Returns NUM in roman letters # input=$1 # input num output="" # Clear output string len=${#input} # Initial length to count down ... (9 Replies)
Discussion started by: sea
9 Replies

2. Shell Programming and Scripting

Sort help: How to sort collected 'file list' by date stamp :

Hi Experts, I have a filelist collected from another server , now want to sort the output using date/time stamp filed. - Filed 6, 7,8 are showing the date/time/stamp. Here is the input: #---------------------------------------------------------------------- -rw------- 1 root ... (3 Replies)
Discussion started by: rveri
3 Replies

3. Shell Programming and Scripting

Help with sort word and general numeric sort at the same time

Input file: 100%ABC2 3.44E-12 USA A2M%H02579 0E0 UK 100%ABC2 5.34E-8 UK 100%ABC2 3.25E-12 USA A2M%H02579 5E-45 UK Output file: 100%ABC2 3.44E-12 USA 100%ABC2 3.25E-12 USA 100%ABC2 5.34E-8 UK A2M%H02579 0E0 UK A2M%H02579 5E-45 UK Code try: sort -k1,1 -g -k2 -r input.txt... (2 Replies)
Discussion started by: perl_beginner
2 Replies

4. UNIX for Advanced & Expert Users

Script to sort the files and append the extension .sort to the sorted version of the file

Hello all - I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
Discussion started by: pankaj80
3 Replies

5. Shell Programming and Scripting

Sorting with header and mixed numerals (scientific and decimal) | awk

Assoc.txt CHR SNP BP A1 TEST NMISS OR STAT P 1 rs2980319 766985 A ADD 4154 1.024 0.1623 0.8711 1 rs2980319 766985 A AGECAT 4154 1.371 6.806 1.003e-11 1 ... (6 Replies)
Discussion started by: genehunter
6 Replies

6. Shell Programming and Scripting

How to select only those file names whose name contains only numerals.

Hi Guru's, Before writing to this forum I have searched extensively on this forum about my problem. I have to write a shell script which takes out only those file names from the given directory which contains only numbers. For example, In the given directory these files are present: ... (5 Replies)
Discussion started by: spranm
5 Replies

7. UNIX for Dummies Questions & Answers

Deleting lines starting with spaces then non-numerals

I did a search but couldn't find a thread that seemed to answer this but my apologies if it has been answered before. I have some text files and I need to remove any line that does not start with a number (0-9). In actuality every line like this starts with a 'T' (or 't') but there are a... (5 Replies)
Discussion started by: skray
5 Replies

8. UNIX for Dummies Questions & Answers

Using CUT command to get only numerals from a string

I need help to get only the numerals from a string Ex : var1=Nightfox has 2 red apple(s) I need to cut only the numeral 2 and move it to a variable. var2=`$var1 | cut -c 14` the cut by character doesnt work, how to get only the numeral ? (2 Replies)
Discussion started by: happyrain
2 Replies

9. Shell Programming and Scripting

extracting only numerals from string.

Hi!!! i have two files "tushar20090429200000.txt" and "tushar_err20090429200000.txt" The numeric part here is date and time. So this part of file keeps changing after every hour. I want to extract the numeric part from the both file names and compare them whether they are equal or not. ... (4 Replies)
Discussion started by: tushar_tus
4 Replies
Login or Register to Ask a Question