sort script - Page 2 | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

sort script

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #8  
Old 10-11-2012
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 16 September 2014, 3:59 PM EDT
Location: Aachen, Germany
Posts: 4,262
Thanks: 72
Thanked 1,029 Times in 975 Posts
Appreciating and supporting what Don and Jim say, but, on the other hand, understanding that and why you are looking for a quick fix, I brought up the following that might do the task. I'm not sure it will run correctly under cygwin - no chance to test. Runs fine, at least with your sample data, on my linux system.
Put the sort order into a file
Code:
Cpyridne_N
Cphenol_O
fusering
nasC

, making sure the items in there match your heading items (no error checking done in here). It needs to open infile twice, but reads only the header line in the first case.
Code:
$(awk   'NR==FNR{Ar[++n]=$1;next}
         FNR=1 {exit}
         END    {printf "sort";
                 for (i=1;i<=n;i++)
                   for (j=1;j<=NF;j++)
                     if (Ar[i]==$j) {printf " -k%d,%d",  j, j; break};
                 printf "\n"}                                     
        ' sortorder infile ) infile

Execute as is, i.e. the entire cmd in $(...). It will sort the headers below everything else; if you can't live with that, use head and tail commands to reverse.

Last edited by RudiC; 10-11-2012 at 05:01 AM..
The Following User Says Thank You to RudiC For This Useful Post:
LMHmedchem (10-11-2012)
Sponsored Links
    #9  
Old 10-11-2012
Don Cragun's Avatar
Don Cragun Don Cragun is offline Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 16 September 2014, 8:52 PM EDT
Location: San Jose, CA, USA
Posts: 4,638
Thanks: 179
Thanked 1,554 Times in 1,322 Posts
I had started working on a solution similar to RudiC's suggestion last night, but fell asleep before finishing it. RudiC left out one important element; for the fields you're sorting you need to specify a numeric sort.

The following works on OS X, but I don't have a Cygwin system to test. This keeps the header line at the start of the file. The way it is written, it produces debugging information in a file named debug.out that shows the fields selected by the given sort keys, shows the sort command that is used to perform the sort, and lists the fields from each record that will be sorted by that command. (Actually, the entire record will be used as a final sort key if all of the selected keys match in some records, but since the first field in each line in your input files is a sequence number, that field will always be enough to disambiguate any records that match up to that point.)

Code:
#!/bin/ksh
awk -v dbg=1 '
BEGIN{  FS = OFS = "\t"}
FNR==NR{# We are in the 1st file.  Each line is the name of a field to be used
        # as a sort key, with the 1st line being the primary sort key.
        key[++nk] = $1
        next
}
FNR==1{ # We are on the 1st line of the 2nd file.  Determine the sort command
        # to use to implement the desired sort order.  All keys are to be
        # treated as ascending order numeric fields.
        sortcmd = "sort -t \"" FS "\" -n"
        for(i = 1; i <= nk; i++) {
                # For each key...
                for(j = 1; j <= NF; j++) {
                        if($j == key[i]) {
                                # We have a match...
                                if(dbg)printf("key[%d](%s) is field %d\n",
                                        i, key[i], j) > "debug.out"
                                if(dbg)keyf[i] = j
                                sortcmd = sortcmd " -k" j "," j
                                break
                        }
                }
                if(j > NF) {
                        # This key does not have a matchine field heading.
                        printf("sorter: No heading matches key[%d] (%s)\n",
                                i, key[i])
                        ec = 1
                }
        }
        if(ec) exit ec
        if(dbg)printf("sortcmd is \"%s\"\n", sortcmd) > "debug.out"
        print
        next
}
{       # We have a data line.  Feed it to sort.
        if(dbg) {
                printf("line %d key info: %s", FNR, $keyf[1]) > "debug.out"
                for(i = 2; i <= nk; i++) printf("\t%s", $keyf[i]) > "debug.out"
                printf("\t%s\n", $1) > "debug.out"
        }
        print | sortcmd
}
END{    close(sortcmd)
}' keys data

If you don't want the debugging information, you can disable it by changing:

Code:
awk -v dbg=1 '

early in the script to:

Code:
awk -v dbg=0 '

or

Code:
awk '

or by removing all of the statements that start with if(dbg) .

Last edited by Don Cragun; 10-11-2012 at 11:23 AM.. Reason: Fix auto-spell checker induced typo.
The Following User Says Thank You to Don Cragun For This Useful Post:
LMHmedchem (10-11-2012)
Sponsored Links
    #10  
Old 10-11-2012
LMHmedchem LMHmedchem is offline
Registered User
 
Join Date: Mar 2010
Last Activity: 12 September 2014, 12:11 PM EDT
Location: Boston
Posts: 255
Thanks: 104
Thanked 5 Times in 5 Posts
Wow, thanks allot for working this out. This will really save me allot of time. It looks like it would be reasonable to make simple changes, like to alphanumeric sorting, or to change the sort order.

After a few changes to make this into a callable script run in bash, this is what I ended up with.

Code:
#!/usr/bin/bash

# call with $1 list of column headers to be sorted on, one header per line
# call with $2 name of file to be sorted

# will be prefixed to name of data file to create output file
OUTPUPREFIX="_makesdf"

# parse arguments
KEYFILE=$1
DATAFILE=$2

# make sure input is has unix EOL
dos2unix -q $KEYFILE
dos2unix -q $DATAFILE

# change to dbg=1 for debug output to logfile
#awk -v dbg=1 '
awk -v dbg=0 '
BEGIN{  FS = OFS = "\t"}
FNR==NR{# We are in the 1st file.  Each line is the name of a field to be used
        # as a sort key, with the 1st line being the primary sort key.
        key[++nk] = $1
        next
}
FNR==1{ # We are on the 1st line of the 2nd file.  Determine the sort command
        # to use to implement the desired sort order.  All keys are to be
        # treated as ascending order numeric fields.
        sortcmd = "sort -t \"" FS "\" -n"
        for(i = 1; i <= nk; i++) {
                # For each key...
                for(j = 1; j <= NF; j++) {
                        if($j == key[i]) {
                                # We have a match...
                                if(dbg)printf("key[%d](%s) is field %d\n",
                                        i, key[i], j) > "debug.out"
                                if(dbg)keyf[i] = j
                                sortcmd = sortcmd " -k" j "," j
                                break
                        }
                }
                if(j > NF) {
                        # This key does not have a matching field heading.
                        printf("sorter: No heading matches key[%d] (%s)\n",
                                i, key[i])
                        ec = 1
                }
        }
        if(ec) exit ec
        if(dbg)printf("sortcmd is \"%s\"\n", sortcmd) > "debug.out"
        print
        next
}
{       # We have a data line.  Feed it to sort.
        if(dbg) {
                printf("line %d key info: %s", FNR, $keyf[1]) > "debug.out"
                for(i = 2; i <= nk; i++) printf("\t%s", $keyf[i]) > "debug.out"
                printf("\t%s\n", $1) > "debug.out"
        }
        print | sortcmd
}
END{    close(sortcmd)
}' $KEYFILE  $DATAFILE > $OUTPUPREFIX"_"$DATAFILE

I have a local sort file with the list of headers to sort on, and this scripts lives with the rest of my path tools (/usr/local/bin/) so I can call it from the shell or another script.

Thanks again,

LMHmedchem
    #11  
Old 10-12-2012
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 16 September 2014, 3:59 PM EDT
Location: Aachen, Germany
Posts: 4,262
Thanks: 72
Thanked 1,029 Times in 975 Posts
@Don Cragun: impressive suggestion, esp. the debug stuff. Seen it before, admired it before, inclined to adopt it.
I had thought about the numeric sort, but as there are non numeric fields in the file as well, I disregarded it for the first attempt. By a slight enhancement we can make my suggestion accept "per field sort options", and this should be doable for Don's code as well:
Code:
$(awk   'NR==FNR{Ar[++n]=$1; SO[n]=$2; next}
         FNR=1 {exit}
         END    {printf "sort";
                 for (i=1;i<=n;i++)
                   for (j=1;j<=NF;j++)
                     if (Ar[i]==$j) {printf " -k%d,%d%s",  j, j, SO[i]; break};
                 printf "\n"}                                     
        ' sortorder infile ) infile

will evaluate the options as given in the sortorder file, e.g. numeric reverse:
Code:
Cpyridne_N
Cphenol_O  nr
fusering   nr
nasC

Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
sort script G30 UNIX for Dummies Questions & Answers 2 11-30-2011 02:44 PM
Script to sort the files and append the extension .sort to the sorted version of the file pankaj80 UNIX for Advanced & Expert Users 3 06-07-2011 09:28 AM
need Unix script to sort p_satyambabu Shell Programming and Scripting 0 05-07-2010 05:54 AM
Using sort with awk script Trellot Shell Programming and Scripting 9 12-14-2007 01:27 AM
Script to sort data wizardy_maximus Shell Programming and Scripting 1 11-21-2007 03:30 AM



All times are GMT -4. The time now is 09:29 PM.