Sorting file content by file extensions

 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support Sorting file content by file extensions
# 1  
Old 05-11-2012
Sorting file content by file extensions

Hi Experts,

I have one .txt file which has filenames with various extensions e.g. .gz,.dat,.CTL,.xml. I want to sort all the filenames as per their extensions and would like to delete all the file names with .xml extension.

Please help.
PS : I am using Sun OS Generic_122300-60.

Thanks,
Ajay
# 2  
Old 05-11-2012
To do the last step, you need only:
Code:
grep -v '\.xml$'

as for sorting, you can try:
Code:
sort -t. -k 2,2 -k 1,1

but that will not sort correctly if you have files with two periods.
# 3  
Old 05-11-2012
Hi.

Here are two other methods. The first uses msort, which allows fields to be specified from the right-hand side. The other uses a quickly written perl code, which reverses the characters on each line:
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate collect by extension, msort, perl, sort
# msort home: http://freshmeat.net/projects/msort

pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
re() { perl -wn -e 'print scalar reverse;' $1; pe ; }
C=$HOME/bin/context && [ -f $C ] && $C msort perl sort

FILE=${1-data1}
pl " Input file $FILE:"
head $FILE

pl " Results with msort:"
msort -q --line -d"." --position=-1,-1 --position=1,1 $FILE

pl " Results with (perl) reverse, sort, reverse:"
re $FILE |
sort -t"." |
re |
tee f1

exit 0

producing:
Code:
% ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian GNU/Linux 5.0.8 (lenny) 
bash GNU bash 3.2.39
msort 8.44
perl 5.10.0
sort (GNU coreutils) 6.10

-----
 Input file data1:
a.xml
a.doc
a.txt
a.jpg
b.xml
b.doc
b.txt
b.jpg
c.xml
c.doc

-----
 Results with msort:
a.doc
b.doc
c.doc
a.jpg
b.jpg
c.jpg
a.txt
b.txt
c.txt
a.xml
b.xml
c.xml

-----
 Results with (perl) reverse, sort, reverse:


a.doc
b.doc
c.doc
a.jpg
b.jpg
c.jpg
a.xml
b.xml
c.xml
a.txt
b.txt
c.txt

The perl function is not completely satisfactory, but perhaps someone will stop by with a suggestion to omit the extra newlines.

I haven't tried to install msort on Solaris, but there is a link on MSORT for it.

Best wishes ... cheers, drl

Last edited by drl; 05-11-2012 at 03:44 PM..
# 4  
Old 05-12-2012
I thought of two better ways to do this.

First, you can use the traditional sort, and this will work fine for 99% of the cases:
Code:
ls -1 | sort -t. -k 3,3 -k 2,2 -k 1,1

You tell sort to order by the 3rd extension, then the 2nd, then the 1st... and sort ignores non-existent fields. The only problem with this sort method is that you get this kind of weird ordering:

Code:
bar
foo
bar.zip
foo.zip
foo.bar.jpg
foo.bar.zip
bar.foo.zip

That is, fields with more than 1 extension have higher sorting precedence than fields with two. So the two zip files seem out-of-place.


So for the best ordering -- the one most likely to be expected, you make the last extension "special" by inserting a special character before the final period. Then sort, then remove the special character. You can use path-separators because those are never part of the filename.
Code:
ls -1 | sed 's/\(\.[^.]*\)$/\/\1/' | sort -t/ -k 2,2 -k 1,1  |  sed 's/\/\([^/]*\)$/\1/'

I'll admit: That's ugly for the command line. It could be a bit nicer if you don't need to worry about full path-names in your list.


Postscript: On Linux, you can find the "rev" command with the util-linux suite. It prints out each line in the file in reverse, so you can use drl's technique in that environment:

Code:
ls -1 | rev | sort | rev

# 5  
Old 05-12-2012
Not sure what ls there is on Solaris, but GNU ls has an -X option that does exactly that -- sorts by extension.

Edit: it doesn't support -X. Never mind. How about this:
Prepend with the extension, sort on it, and then take it out (DSU = decorate-sort-undecorate):
Code:
ls | awk -F. '{print $NF,$0}' | sort -k1  | cut -d" " -f2-


Last edited by mirni; 05-12-2012 at 03:57 AM.. Reason: Solaris ls check
# 6  
Old 05-12-2012
Both rev commands also reverse the suffixes, and so they do not get sorted in alphabetical order.

A different strategy would be to prepend with suffix and a dot or just a space and a dot if there is no suffix and remove them after the sort. The sort would still need to use -t.

Another option might be just to list the suffixes, if they are not too many:
Code:
ls -l *.gz *.dat *.ctl *.xml

# 7  
Old 05-12-2012
Hi.

Good selection of techniques and problem solving.
Quote:
Originally Posted by mirni
...
Code:
ls | awk -F. '{print $NF,$0}' | sort -k1  | cut -d" " -f2-

Quote:
Originally Posted by otheus
Code:
ls -1 | sed 's/\(\.[^.]*\)$/\/\1/' | sort -t/ -k 2,2 -k 1,1  |  sed 's/\/\([^/]*\)$/\1/'

So far I like those 2 solutions the best for utilizing standard tools, at least on the data files posted so far in this thread. The msort solution is a single-command (but non-standard) solution: the ability to specify fields from the right-hand-side is invaluable in this situation.

Quote:
Originally Posted by Scrutinizer
Both rev commands also reverse the suffixes, and so they do not get sorted in alphabetical order ...
That is true, however, my impression was that the OP desired grouping rather than strict sorting. In which case the revs work except in the cases where there are no extensions. In those situations, the no-suffixed files are not in a group by themselves. The possibility of more than dot does complicate the issue, and I'm glad that it was raised.

After some thought, a better re function for my script is:
Code:
re() { perl -wn -e 'chomp;print scalar reverse,"\n";' $1 ; }

Best wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

List the files after sorting based on file content

Hi, I have two pipe separated files as below: head -3 file1.txt "HD"|"Nov 11 2016 4:08AM"|"0000000018" "DT"|"240350264"|"56432" "DT"|"240350264"|"56432" head -3 file2.txt "HD"|"Nov 15 2016 2:18AM"|"0000000019" "DT"|"240350264"|"56432" "DT"|"240350264"|"56432" I want to list the... (6 Replies)
Discussion started by: Prasannag87
6 Replies

2. Shell Programming and Scripting

How to remove exisiting file content from a file and have to append new file content?

hi all, i had the below script x=`cat input.txt |wc -1` awk 'NR>1 && NR<'$x' ' input.txt > output.txt by using above script i am able to remove the head and tail part from the input file and able to append the output to the output.txt but if i run it for second time the output is... (2 Replies)
Discussion started by: hemanthsaikumar
2 Replies

3. UNIX for Dummies Questions & Answers

Need to exclude certain file extensions while listing the file using ls

Hi friends, I need to check for the latest file say i have list of files like this test_files test_files.1 test_files.2 test_files.3.bin.Z I do it this way ls -lrt test_files*|tail -1 Now i need to exclude test_files.3.bin.Z even if it is the latest file,how do i do... (3 Replies)
Discussion started by: 100bees
3 Replies

4. Shell Programming and Scripting

Sed: replace content from file with the content from file

Hi, I am having trouble while using 'sed' with reading files. Please help. I have 3 files. File A, file B and file C. I want to find content of file B in file A and replace it by content in file C. Thanks a lot!! Here is a sample of my question. e.g. (file A: a.txt; file B: b.txt; file... (3 Replies)
Discussion started by: dirkaulo
3 Replies

5. Shell Programming and Scripting

Remove comments from file with specific file name extensions

Hello Unix board community, I have to program a shell script, but I am a complete noob so I hope I get some help here. The assignment is as follows: The program removes all comments regardless of formatting or language from files with specific file name extensions (php, css, js, ...).... (3 Replies)
Discussion started by: TheZeusMan
3 Replies

6. Shell Programming and Scripting

Sorting content of file

hi ladies and gents: can you give me a command to sort content of file and save it to the file itself: file1 roy@emerson.com joy@emerson.com irish@emerson.com output would be file1 on same directory: file1: irish@emerson.com joy@emerson.com roy@emerson.com (6 Replies)
Discussion started by: linuxgeek
6 Replies

7. Shell Programming and Scripting

Checking file extensions

I am trying to store file with certain file extensions to list but having some problems. Here is a part of the code set fryLst = "" set fxtLst = "" foreach f ($AfullNameLst) set fname = $f:r set fext = $f:e if ("$fext" == ".ry") set fryLst = "$fryLst $f" if ("$fext" == ".xt")... (2 Replies)
Discussion started by: kristinu
2 Replies

8. Shell Programming and Scripting

Sorting Files according to their Extensions...

I am trying to write a Korne Shell Script wherein we have to sort files according to their extensions(for eg. 1.sh, 5.sh, 9.sh together; 4.csh, 120.csh, 6.csh together and 7.ksh, 2.ksh, 59.ksh together) and move them to their respective directories viz. sh, csh and ksh... I think,... (1 Reply)
Discussion started by: marconi
1 Replies

9. UNIX for Dummies Questions & Answers

sorting file content on columns

guys i have a question: i'd like to sort files (as many I want) in columns so to visualize them one near the other...so let's say i have just 2 files: FILE1 John Mary Bridget FILE2 Anne Robert Mark i would like to obtain: John Anne Mary Robert Bridget ... (2 Replies)
Discussion started by: marshmallow
2 Replies

10. Shell Programming and Scripting

File name extensions

Hello people, I was wondering if anyone could help me? I want to produce a shell script that changes the filename extension on all matching file. E.G. change all files called ‘something.rtf' to ‘something.doc' by giving the command: Changex rtf doc *where ‘Changex' is the name of... (2 Replies)
Discussion started by: thurrock
2 Replies
Login or Register to Ask a Question