Trouble with alphanumeric Sort


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Trouble with alphanumeric Sort
# 1  
Old 07-29-2013
Trouble with alphanumeric Sort

I have a lot of data that need to be sorted alphanumerically. I began using sort -du and it solved almost all my problems. However, when I encountered files with data like this it began to fail:


Code:
/vol/close_eng_ice_0888
/vol/open_eng_ice_0890
/vol/open_eng_ice_08923
/vol/open_eng_ice_0893

I then tried this to see if I could get it working:

Code:
cat list  |sed 's/.*[^0-9]\([0-9][0-9]*\)[^0-9]*$/\1 &/' | sort -n | cut -d" " -f2-
/vol/close_eng_ice_0888
/vol/open_eng_ice_0890
/vol/open_eng_ice_0893
/vol/open_eng_ice_08923

This worked just fine but then there is one other issue. I want the alphabetic to take precedence over the numeric. What if I had a file like this:

Code:
cat list |sed 's/.*[^0-9]\([0-9][0-9]*\)[^0-9]*$/\1 &/' | sort -n | cut -d" " -f2-
/vol/zebra_eng_ice_022
/vol/close_eng_ice_0888
/vol/open_eng_ice_0890
/vol/open_eng_ice_0893
/vol/open_eng_ice_08923

I would want zebra to be on the bottom. So if the files are identical in name but different in number I want them sorted numerically. If the files are different alphabetically, I want the alphabetic sort to take precedence. I have tried playing with sort and sed and can't figure this out.

Is this possible with bash, or do you need a perl script to do it?

Last edited by Scott; 07-29-2013 at 11:07 AM.. Reason: Fixed code tags
# 2  
Old 07-29-2013
Is this what you try to accomplish?
Code:
$ sort -t _ -k 1,4n file
/vol/close_eng_ice_0888
/vol/open_eng_ice_0890
/vol/open_eng_ice_08923
/vol/open_eng_ice_0893
/vol/zebra_eng_ice_022

# 3  
Old 07-29-2013
You made a tag sort -good.

change the sort part of the command line to:

Code:
| sort -k2 -k3n |

Check out UUOC as well:
http://en.wikipedia.org/wiki/Cat_(Unix)
# 4  
Old 07-29-2013
Quote:
Originally Posted by Subbeh
Code:
$ sort -t _ -k 1,4n file

When the first character of the numerically sorted key (in this case, the first character of each line, /) is not a valid numeric component, it is as if the number is zero. Since every line begins with an invalid numeric character, they all evaluate to zero. All of these ties are broken by a full-length alphabetic sort.

In short, for this data, your suggestion is no different from a simple sort without any options.

Further, consider the implications of using -t _ with -k 1,4n. Even if every line began with a valid numeric, how can a numeric key possibly span fields delimited by an underscore? The first underscore will always terminate the number. So, in this case, -k 1,4n is equivalent to -k 1n.

Regards,
Alister

Last edited by alister; 07-29-2013 at 11:53 AM..
# 5  
Old 07-29-2013
You're right, thanks for pointing that out alister. I didn't have a proper look at it.
# 6  
Old 07-29-2013
Quote:
Originally Posted by newbie2010
I want the alphabetic to take precedence over the numeric. What if I had a file like this:

Code:
cat list |sed 's/.*[^0-9]\([0-9][0-9]*\)[^0-9]*$/\1 &/' | sort -n | cut -d" " -f2-
/vol/zebra_eng_ice_022
/vol/close_eng_ice_0888
/vol/open_eng_ice_0890
/vol/open_eng_ice_0893
/vol/open_eng_ice_08923

I would want zebra to be on the bottom. So if the files are identical in name but different in number I want them sorted numerically. If the files are different alphabetically, I want the alphabetic sort to take precedence. I have tried playing with sort and sed and can't figure this out.
You're solution is nearly complete. You're decorating with the trailing sequence of digits, sorting on that, and then cutting it out before output. The only thing you need is another field for the preceding text. Instead of ...
Code:
022 /vol/zebra_eng_ice_022

... one possibility is to use ...
Code:
/vol/zebra_eng_ice 022 /vol/zebra_eng_ice_022

This extra field consisting of the name without the trailing number is necessary because the trailing number is not part of the name for the purposes of alphabetical sorting.

I would accomplish this using the same tools you've used, but with a different sed approach:
Code:
sed 'h; s/\(.*\)_/\1 /; G; y/\n/ /' | sort -k1,1 -k2n | cut -d' ' -f3-

This code (like yours) assumes that there are never any spaces in a pathname. If this is not true, it can be modified to use a different delimiter.

Regards,
Alister

Last edited by alister; 07-29-2013 at 11:56 AM..
# 7  
Old 07-29-2013
If all of your files are in the same directory (or you want the directory name to be part of the primary sort key, all of your filenames contain 3 underscore characters, and you want the 1st three underscore separated fields sorted alphanumerically and the 4th field to be sorted numerically as the final sort key, the following simple sort command does what you want:
Code:
sort -t_ -k1,3 -k4,4n list

With the sample input you showed, this command produces the output:
Code:
/vol/close_eng_ice_0888
/vol/open_eng_ice_0890
/vol/open_eng_ice_0893
/vol/open_eng_ice_08923
/vol/zebra_eng_ice_022


Last edited by Don Cragun; 07-29-2013 at 12:46 PM.. Reason: Fix typo
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sort two columns with alphanumeric values horizontally

Hi, I have a file like aa bb dmns|860 dmns|756 ee ff aa bb dmns|310 dmns|260 ee ff aa bb dmns|110 dmns|77 ee ff aa bb dmns|756 dmns|860 ee ff aa bb dmns|110 dmns|77 ee ff aa bb dmns|233 dmns|79 ee ff aa bb dmns|79 dmns|233 ee ff I want to sort the values in column3 and column4... (2 Replies)
Discussion started by: sammy777888
2 Replies

2. UNIX for Dummies Questions & Answers

Want to sort a file which contains alphanumeric strings

I want to sort a file which contains alphanumeric string. bash-3.00$ cat abc mtng1so mtng2so mtng11so mtng9so mtng23so mtng7so hstg2so hstg9so hstg1so hstg11so hstg13so bash-3.00$ Want output like this, using one liner. hstg1so (1 Reply)
Discussion started by: Raza Ali
1 Replies

3. UNIX for Dummies Questions & Answers

Parsing alphanumeric variables

Hi All, I have files with a column which has values and ranges, for example colA colB ERD1 3456 ERD2 ERD3 4456 I want to have the following output colA colB colC ERD1 3456 3456 ERD2 526887 526890 ERD3 4456 4456 Being a newbie to... (2 Replies)
Discussion started by: alpesh
2 Replies

4. Shell Programming and Scripting

Perl regex and sort trouble

Hi so I have these files where the first thing in them says something along the lines of "This document was accessed 'date' blah blah", I was thinking of a way to extract that date and then sort the files based on that date. My question is how do I get rid of the words in that statement so that... (6 Replies)
Discussion started by: vas28r13
6 Replies

5. Shell Programming and Scripting

Alphanumeric to integer

Hi folks, I have a value like A12,i could able to change this into integer using typeset as below typeset -i A12 But, I need your advice to change the values like 1A2 or 12A into integer. Thanks in advance. Thanks, Sathish (3 Replies)
Discussion started by: bsathishmca
3 Replies

6. Shell Programming and Scripting

get rid of non-alphanumeric characters

Hi! Could anyone so kindly help me a code to eliminate from a txt file, obtained by collecting and merge several web-page, every word (string) containing non alphabetical, numeric and punctuation character (i.e NON a-zA-Z0-9, underscore and punctuation mark)? Thanks a lot for the help to... (5 Replies)
Discussion started by: mjomba
5 Replies

7. UNIX for Advanced & Expert Users

alphanumeric Sorting

Hi , I have a requirement where one column have to be sorted (delimiter is pipe) for eg: My input filed is as below 1|FIAT|0010103|23011|01/01/2000|31/12/9999|1.15 2|232|613|1 2|234|743|1 2|234|793|1 2|234|893|1 1|FIAT|0010103|23012|01/01/2000|31/12/9999|1.15 2|230|006|0 2|230|106|0... (9 Replies)
Discussion started by: laxmi131
9 Replies

8. Shell Programming and Scripting

alphanumeric comparision

I have a requirement where I need to check if where r1v07l09ab is a software release. I should always check for this to be true to continue the release deployment because an older release should not be deployed by mistake. I mean only the release greater than the current release should be... (3 Replies)
Discussion started by: rakeshou
3 Replies

9. UNIX for Dummies Questions & Answers

AlphaNumeric String Operations

Hi :) I am writing a ksh I have a string of general format A12B3456CD78 the string is of variable length the string always ends with numbers (here it is 78.. it can be any number of digits may be 789 or just 7) before these ending numbers are alphabets (here it is CD can even be... (3 Replies)
Discussion started by: lakshmikanth
3 Replies

10. Shell Programming and Scripting

sort command - alphanumeric

I have a file I'm trying to sort such as fred1 fred2 fred10 fred11 ... when I sort I get fred1 fred10 fred11 fred2 ... using sort can any give me the syntax to sort this is dict order e.g., (4 Replies)
Discussion started by: gefa
4 Replies
Login or Register to Ask a Question