Explanation of the sort command


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Explanation of the sort command
# 1  
Old 01-17-2013
Explanation of the sort command

Hi everyone,

I am wondering if someone could please break down and explain the following sort command for me:

Code:
ls ${DEST_LOCATION}/${FILES} | sort -rt -k 4,4n | head -1

I have tried working it out using 'man sort', but on AIX there is not a great explanation of this function. I know that it searches the source directory for the named file and returns the name of the latest dated, but I am particularly confused as to what
Code:
-k 4,4n | head -1

is doing.

Thanks in advance.

Last edited by jimbojames; 01-20-2013 at 07:57 PM.. Reason: Solved.
# 2  
Old 01-17-2013
Quote:
Originally Posted by jimbojames
Hi everyone,

I am wondering if someone could please break down and explain the following sort command for me:

Code:
ls ${DEST_LOCATION}/${FILES} | sort -rt -k 4,4n | head -1

I have tried working it out using 'man sort', but on AIX there is not a great explanation of this function. I know that it searches the source directory for the named file and returns the name of the latest dated, but I am particularly confused as to what
Code:
-k 4,4n | head -1

is doing.

Thanks in advance.
This is not a sort command, it is a pipeline containing three commands: ls, sort, and head
The sort command in the middle of this makes no sense: The options are interpreted as follows: -r reverses the sort order, -t the next argument (in this case -k) should be a single character which is used as the field delimiter separating fields in the input that is to be sorted. I don't believe that you have shown us what this argument is. The -k option with the option argument 4,4n says to sort the input data based on the 4th field and sort the values numerically (rather than alphabetically).

The first command in the pipeline (ls ${DEST_LOCATION}/${FILES}) will provide a list of files (one per line) that match the pattern to which ${DEST_LOCATION}/${FILES} expands. Without knowing how DEST_LOCATION and FILES were set, I have no idea what filenames are going to be sorted by the sort command. When the sort is completed, the last command in the pipeline (head -1) will print the 1st line and ignore everything else.

Please recheck this command line. For this pipeline to make any sense there has to be another argument to the ls command or the argument (-rt) to sort would have to be just -r instead, and there would need to be options given to ls to produce multi-column output with the 4th field being a numeric value. If the command was:
Code:
ls -l ${DEST_LOCATION}/${FILES} | sort -r -k 5,5n | head -1

the output would be a long format listing of the largest file matched by the pattern specified by ${DEST_LOCATION}/${FILES}.
# 3  
Old 01-20-2013
Hi Don,

Sorry for the confusion, this is part of a script that was created by someone who no longer works here, and I was trying to use part of the functionality in a new script that I was creating.

I've managed to modify it using
Code:
FILENAME=$(ssh ${DEST_SERV} "ls ${DEST_LOCATION}/${FILES} | sort -r -k 4n | head -1")

which is then followed with
Code:
ssh ${DEST_SERV} "ls ${DEST_LOCATION}/${FILES} | grep -v ${FILENAME} | xargs rm -rf"

to remove the unwanted files.

I was confused with what the
Code:
4,4n

was doing, but your explanation has helped with this, although I still don't understand what the 4 before the comma is doing.

Thanks for your help.
# 4  
Old 01-20-2013
Quote:
Originally Posted by jimbojames
I was confused with what the
Code:
4,4n

was doing, but your explanation has helped with this, although I still don't understand what the 4 before the comma is doing.
"sort" interprets lines as "fields" being delimited by "delimiters". Per default this delimiter is whitespace.

Per default "sort" sorts on field 1, then on field 2, then field 3 and so on until the end of the line. Lines with equal values of field 1 will be sorted on the contents of field 2, lines with equal fields 1 & 2 sorted on field 3, etc. If you want another field or part of it) as the primary key you will have to use the "-k" option and a field number. But this leaves the question, how lines with equal sort keys should be handled.

Per default sort will use the fields starting with field 1 as secondary sort key in this case:

Code:
sort -k4

will produce the sort order: f4, f1, f2, f3, f4, ....

By defining, where the key should end you can change this default behavior:

Code:
sort -k4,4

Sorting occurs exclusively on field 4, because the ",4" says the ending of the sort key is field 4.

The man page of sort should explain this in greater detail.

I hope this helps.

bakunin
# 5  
Old 01-20-2013
Thanks Guys, you help and explanations have been fantastic.
# 6  
Old 01-20-2013
Quote:
Originally Posted by bakunin
"sort" interprets lines as "fields" being delimited by "delimiters". Per default this delimiter is whitespace.

Per default "sort" sorts on field 1, then on field 2, then field 3 and so on until the end of the line. Lines with equal values of field 1 will be sorted on the contents of field 2, lines with equal fields 1 & 2 sorted on field 3, etc. If you want another field or part of it) as the primary key you will have to use the "-k" option and a field number. But this leaves the question, how lines with equal sort keys should be handled.

Per default sort will use the fields starting with field 1 as secondary sort key in this case:

Code:
sort -k4

will produce the sort order: f4, f1, f2, f3, f4, ....

By defining, where the key should end you can change this default behavior:

Code:
sort -k4,4

Sorting occurs exclusively on field 4, because the ",4" says the ending of the sort key is field 4.

The man page of sort should explain this in greater detail.

I hope this helps.

bakunin
bakunin:
Close, but not quite.
The sort key 4,4n treats the entire 4th field as a numeric field to be used as a sort key. The sort key 4n treats the data starting at the beginning of the 4th field to the end of the line as a numeric field to be used as a sort key. With either of these sort keys, lines that compare equal on that sort key will use the entire line as the secondary sort key treating the entire line as an alphanumeric string to be sorted in reverse order (since the -r option was also specified and was not associated with a particular sort key).

jimbojames:
Since the command ls ${DEST_LOCATION}/${FILES} provides a list of file pathnames (each line containing a single field unless there are spaces in the pathnames being sorted) the -k option to sort is just a distraction. If:
Code:
ls ${DEST_LOCATION}/${FILES} | sort -r -k 4n | head -1

really is the pipeline being executed, ls is not an alias adding options that will add fields to its output making the 4th field a numeric field, and that the expansion of ${DEST_LOCATION} does not start with a hyphen character; then the -k option to sort is meaningless, and the output will be the last file pathname in the list of files to which ls ${DEST_LOCATION}/${FILES} expands sorted alphabetically. Furthermore, unless one or more of the pathnames to which ${DEST_LOCATION}/${FILES} expands is a directory, the output will be equivalent to the simpler and faster pipeline:
Code:
ls ${DEST_LOCATION}/${FILES} | tail -1

I know that you inherited this pipeline and don't understand what it does. I don't understand why this command line is seemingly using options that make no sense. The only way for you to understand what is going on here (or for us to explain it to you), is for you to examine the contents of the pipes being passed through this pipeline and show us what you learn.

Try replacing:
Code:
ls ${DEST_LOCATION}/${FILES} | sort -r -k 4n | head -1

with:
Code:
ls ${DEST_LOCATION}/${FILES} | tee /tmp/pipe1 | sort -r -k 4,4n | tee /tmp/pipe2 | head -1

and show us the output produced and the contents of /tmp/pipe1 and /tmp/pipe2.
# 7  
Old 01-20-2013
Hi Don,

The output from running
Code:
ls ${DEST_LOCATION}/${FILES} | tee /tmp/pipe1 | sort -r -k 4,4n | tee /tmp/pipe2 | head -1

is
Quote:
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PreviousOld.CSV
.

The contents of pipe1 are:

/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130112235921.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130113001600.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130114002828.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130115144300.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130116105000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130116122400.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130116150101.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130116235654.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130117113000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130117133000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130117155000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130117235938.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_201302091000.CSV

And the contents of pipe2 are:

/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_201302091000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130117235938.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130117155000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130117133000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130117113000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130116235654.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130116150101.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130116122400.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130116105000.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130115144300.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130114002828.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130113001600.CSV
/data/projects/PROD_IPU/inbound/SOVEREIGN/SOVEREIGN_PROD_20130112235921.CSV

What I am attempting to do is identify the newest file and delete any other than this, which I think is working using the combination of
Code:
FILENAME=$(ssh ${DEST_SERV} "ls ${DEST_LOCATION}/${FILES} | sort -r -k 4n | head -1")

and
Code:
ssh ${DEST_SERV} "ls ${DEST_LOCATION}/${FILES} | grep -v ${FILENAME} | xargs rm -rf"

.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Explanation of Nawk command

Hi Folks, I am struggling to understand nawk command which was used by another developer. Can you please explain what each character or string is doing here below: if ; then (3 Replies)
Discussion started by: kirans.229
3 Replies

2. UNIX for Beginners Questions & Answers

Explanation for sort utility and its detail

I tried to use sort utility and typed sort --help, read one of lines; its -k option, and part of it: I am really not getting it Anyone do me a useful favor to save me out of my ignorance ? Please use correct tags as required by forum rules! (1 Reply)
Discussion started by: abdulbadii
1 Replies

3. Shell Programming and Scripting

sed command explanation

Will someone give me an explanation on how the sed command below works. sed 's/.*//' Thanks! (3 Replies)
Discussion started by: scj2012
3 Replies

4. Shell Programming and Scripting

Need explanation a of command in linux

Hi All I ran a script in Linux. In the script i have lines like && echo "Failed: Missing ${CM_ENV_FILE} \n" && return 1 . ${CM_ENV_FILE} Where CM_ENV_FILE = /data/ds/dpr_ebicm_uat//etl/cm3_0/entities/BBME/parameters/cm.env But its taking this path... (1 Reply)
Discussion started by: vee_789
1 Replies

5. UNIX for Advanced & Expert Users

command explanation

can anyone please tell me what does this expression means , i am under probation and need some explanation :) $AUDIT_DIR -type f -mtime +$AUDIT_EXPIRE \ -exec rm {} > /dev/null 2>&1 \; AUDIT_DIR="/var/log/" AUDIT_EXPIRE='30' Please use code tags! (4 Replies)
Discussion started by: semaan
4 Replies

6. Shell Programming and Scripting

sed sorting command explanation

sed '$!N; /^\(.*\)\n\1$/!P; D' i found this file which removes duplicates irrespective for sorted or unsorted file. keep first occurance and remove the further occurances. can any1 explain how this is working.. i need to remove duplicates following file. duplicate criteria is not the... (3 Replies)
Discussion started by: mukeshguliao
3 Replies

7. UNIX for Dummies Questions & Answers

SED command explanation

can someone please explain the below sed command.. sed 's/\(*|\)\(.*\)/\2\1/' (6 Replies)
Discussion started by: raghu_shekar
6 Replies

8. Shell Programming and Scripting

How to Sort Floating Numbers Using the Sort Command?

Hi to all. I'm trying to sort this with the Unix command sort. user1:12345678:3.5:2.5:8:1:2:3 user2:12345679:4.5:3.5:8:1:3:2 user3:12345687:5.5:2.5:6:1:3:2 user4:12345670:5.5:2.5:5:3:2:1 user5:12345671:2.5:5.5:7:2:3:1 I need to get this: user3:12345687:5.5:2.5:6:1:3:2... (7 Replies)
Discussion started by: daniel.gbaena
7 Replies

9. Shell Programming and Scripting

command line explanation

Hello everyone, I found this command line in a website: perl -pi.bak -we's/\z/Your new line\n/ if $. == 2;' your_text_file.txt With this command line you can insert a new line anywhere you want in a text without overwriting what's in it. -p causes perl to assume a loop around your... (4 Replies)
Discussion started by: goude
4 Replies

10. Shell Programming and Scripting

sed command explanation needed

Hi, Could you please explain me the below statement -- phrase wise. sed -e :a -e '$q;N;'$cnt',$D;ba' abc.txt > xyz.txt if suppose $cnt contains value: 10 it copies last 9 lines of abc.txt to xyz.txt why it is copying last 9 rather than 10. and also what is ba and $D over there in... (4 Replies)
Discussion started by: subbukns
4 Replies
Login or Register to Ask a Question