|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
sort script
I have need of a sorting script that will sort a file of tab delimited data on multiple columns. I often do this kind of thing in excel, but I have need of something more automated. I guess the syntax would be something like, sort by columnName, then by otherColumnName, etc. There are allot of options for this sort of thing, such as the sort order (a-z, z-a), data types, etc, so I'm not sure where to begin. I know that the higher level interpreters like python have nice sort functions, but I don't know python really at all. I guess that for now I would hard code the sort criteria, but it also might be nice to eventually be able to pass that in, assuming that a reasonable syntax could be developed.
Any suggestions as to where to start with this? LMHmedchem |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
It looks like I could use the sort command, but I would prefer to not use column numbers, but header names. Since this will be automated to some extent, it will not be readily apparent if the wrong col number was used. I supposed I could search for the header name and determine what the correct col number is, but I'm not sure how to do that either. I'm sure there is an awk command for that, but I don't remember.
LMHmedchem |
|
#4
|
|||
|
|||
|
Unless you will know from the header name how a field should be sorted (forward or reverse, numeric or alphanumeric, etc.), you will probably need some manual intervention. If you just want to use header names instead of field numbers but can otherwise use the sort utility's flags to control the type of sorting to be done, we can probably help you come up with a way to do that if you give us some concrete examples of what your data looks like (including the headers). Please use code tags when you post this information; knowing whether fields are separated by <space>s, <tab>s, <colon>s, <comma>s or something else will be crucial to coming up with a working solution.
You are correct in noting that some versions of awk provide built-in sorting functions, but they are not portable and (as far as I know) limited to increasing order alphanumeric sorting. |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
Below is a sample of a file I just sorted in excel. It's tab delimited text and has unix EOL if that matters. This was sorted as follows, Cpyridne_N Cphenol_O fusering nasC all sorted in ascending order and all of these are ints. I think that I could use the sort arguments if I had the column numbers, but it would be nice to investigate what the possibilities are. Most of the scripts I start out with require some manual editing until I figure out exactly how I want to use them and get them refined. I am doing this in windows under cygwin in case that makes a difference. LMHmedchem Code:
id IUPACNAME filePath IDNUMBER PG K-MWT K-EST K-CHI K-KAP RI3_1 ring fusering branch arom nasN nasO nasP nasS fw nasH nasC nasNBIO nasHet nasProt CddsN CtN nrings ncirc nbranch CaromC npc4 nch3 nch4 nch5 nch6 nch7 nch8 nch9 nch10 Calph_N Calph_NH Caniline_N Caniline_H Cpyridne_N Cpyrrole_N Cpyrrole_H Camide_H Camide_N Camide_O Cprot_N Calph_O Calph_OH Cacid_dO Cacid_sO Cacid_H Cphenol_O Calph_S Carom_S Cacid_CH Csp3OH Csp2OH CKetone CAldhyd CFormate CEster CAnhyde CaaO CEther COxirane COxetane CNoxide CPNoxide COxime CCar CCarbam CCarbacd CGuano CBenzam CsSH CssS CdS CdssS CddssS CSul CdCH2 CtCH 1 pyridin-4-ol ST044793.mol ST044793 5 9510 13785 4522 1003 164.26 1 0 1 1 1 1 0 0 95.1006 5 5 0 2 1 0 0 1 1 1 5 2 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2,8-dimethylquinolin-4-ol ST4140189.mol ST4140189 8 17321 18077 11694 1673 298.65 1 1 1 1 1 1 0 0 173.214 11 11 0 2 1 0 0 2 3 5 9 18 0 0 0 2 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2,6-dimethylquinolin-4-ol ST090591.mol ST090591 5 17321 18078 11636 1688 298.637 1 1 1 1 1 1 0 0 173.214 11 11 0 2 1 0 0 2 3 5 9 16 0 0 0 2 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 2,7,8-trimethylquinolin-4-ol ST090619.mol ST090619 5 18724 18527 12992 1827 313.572 1 1 1 1 1 1 0 0 187.241 13 12 0 2 1 0 0 2 3 6 9 22 0 0 0 2 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 8-ethyl-2-methylquinolin-4-ol ST085252.mol ST085252 5 18724 18732 12630 1900 311.272 1 1 1 1 1 1 0 0 187.241 13 12 0 2 1 0 0 2 3 5 9 19 0 0 0 2 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 4-(10-methyl-8,9,10,11-tetrahydrobenzo[a]acridin-12-yl)phenol ST075331.mol ST075331 5 33944 31160 34829 3237 469.926 1 1 1 1 1 1 0 0 339.436 21 24 0 2 1 0 0 5 11 10 19 42 0 0 0 5 0 0 0 3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 4-[6-(4-hydroxyphenyl)-2-pyridyl]phenol HTS12699.mol HTS12699 5 26330 30423 18308 2886 330.105 1 0 1 1 1 2 0 0 263.295 13 17 0 3 1 0 0 3 3 6 17 20 0 0 0 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 4-(10-methyl-8,9,10,11-tetrahydrobenzo[a]acridin-12-yl)benzene-1,3-diol ST4068500.mol ST4068500 8 35544 37593 36522 3385 394.257 1 1 1 1 1 2 0 0 355.435 21 24 0 3 1 0 0 5 11 11 19 46 0 0 0 5 0 0 0 3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 pyrimidin-2-ol ST075441.mol ST075441 5 9609 15301 4651 998 133.621 1 0 1 1 2 1 0 0 96.0884 4 4 0 3 2 0 0 1 1 1 4 2 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 4-methylpyrimidin-2-ol ST067104.mol ST067104 5 11012 15676 5587 1166 190.913 1 0 1 1 2 1 0 0 110.115 6 5 0 3 2 0 0 1 1 2 4 4 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 6-methyl-2-(methylethyl)pyrimidin-4-ol ST059600.mol ST059600 5 15220 17380 8313 1738 245.562 1 0 1 1 2 1 0 0 152.196 12 8 0 3 2 0 0 1 1 4 4 10 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 4-(3-methylphenyl)pyrimidin-2-ol ST45073613.mol ST45073613 8 18621 20654 11588 2019 274.464 1 0 1 1 2 1 0 0 186.213 10 11 0 3 2 0 0 2 2 4 10 12 0 0 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 5,6-diphenylpyrazin-2-ol ST039025.mol ST039025 5 24828 25454 17493 2700 316.406 1 0 1 1 2 1 0 0 248.284 12 16 0 3 2 0 0 3 3 5 16 20 0 0 0 3 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 2-(4,6-diphenylpyrimidin-2-yl)phenol ST029297.mol ST029297 5 32438 31182 25225 3569 391.401 1 0 1 1 2 1 0 0 324.381 16 22 0 3 2 0 0 4 4 7 22 28 0 0 0 4 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 2-(2,6-diphenylpyrimidin-4-yl)phenol ST029295.mol ST029295 5 32438 31239 25221 3569 390.866 1 0 1 1 2 1 0 0 324.381 16 22 0 3 2 0 0 4 4 7 22 28 0 0 0 4 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 4-quinoxalin-2-ylphenol ST058726.mol ST058726 5 22225 24069 16171 2241 292.885 1 1 1 1 2 1 0 0 222.246 10 14 0 3 2 0 0 3 4 5 14 18 0 0 0 3 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 3-(1-phenylpyridino[3,2-f]quinolin-3-yl)phenol ST090881.mol ST090881 5 34840 34402 33217 3494 428.414 1 1 1 1 2 1 0 0 348.403 16 24 0 3 2 0 0 5 8 9 24 38 0 0 0 5 0 0 0 2 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 6-(4-ethylphenyl)benzo[a]phenazin-5-ol HTS02737.mol HTS02737 5 35042 34250 36169 3408 398.524 1 1 1 1 2 1 0 0 350.419 18 24 0 3 2 0 0 5 11 10 22 45 0 0 0 5 0 0 0 3 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 pyrimidine-2,4-diol ST45061548.mol ST45061548 8 11209 21655 5646 1164 113.724 1 0 1 1 2 2 0 0 112.088 4 4 0 4 2 0 0 1 1 2 4 4 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 quinoxaline-2,3-diol ST003022.mol ST003022 5 16215 25248 10508 1518 208.677 1 1 1 1 2 2 0 0 162.148 6 8 0 4 2 0 0 2 3 4 8 14 0 0 0 2 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 5,6-dimethylquinoxaline-2,3-diol HTS06808.mol HTS06808 5 19020 26134 13025 1821 260.958 1 1 1 1 2 2 0 0 190.201 10 10 0 4 2 0 0 2 3 6 8 22 0 0 0 2 0 0 0 1 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 pyridino[3,2-h]quinoline-4,7-diol HTS08910.mol HTS08910 5 21221 29008 16776 1918 242.061 1 1 1 1 2 2 0 0 212.207 8 12 0 4 2 0 0 3 6 6 12 26 0 0 0 3 0 0 0 2 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 5,8-dimethylquinolino[3,2-c]acridine-6,7-diol ST059734.mol ST059734 5 34038 38753 38389 3061 348.484 1 1 1 1 2 2 0 0 340.381 16 22 0 4 2 0 0 5 15 12 20 56 0 0 0 5 0 0 0 4 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24 4-pyridino[4,3-e]pyrazin-3-ylphenol ST058761.mol ST058761 5 22323 25841 16236 2237 260.974 1 1 1 1 3 1 0 0 223.234 9 13 0 4 3 0 0 3 4 5 13 18 0 0 0 3 0 0 0 1 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 pyrazino[2,3-d]pyridazine-5,8-diol ST091682.mol ST091682 5 16412 28414 10823 1497 176.671 1 1 1 1 4 2 0 0 164.123 4 6 0 6 4 0 0 2 3 4 6 16 0 0 0 2 0 0 0 1 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
IMO: your idea lacks merit.
You should simply insert the data into a db with a loader, then produce output. If you have Excel you have MS office and probably a db like Access. UNIX has free db's: mysql, Berekely db, etc. So, Why does your idea look like reinventing the wheel? The reason is that Don Cragun is right. You have to have complete metadata for each and every column to be able to sort based on arbitrary column selections. And there is too much metadata to cobble together anything useful in shell. Metadata can change. db's are meant for that contingency. The way it is now, you have to know what you are sorting by looking at it - in effect gathering metadata. You have to use human intelligence to make decisions. Then use unix sort/Excel sort. Databases are meant to do this. You just tell db's to sort on column names, optionally: ascending or descending for each column. You can write database scripts (SQL language for example) to do stats or a lot of what you already can do in Excel. So, stay in Excel or move to a db. I vote for a db. |
| The Following User Says Thank You to jim mcnamara For This Useful Post: | ||
LMHmedchem (10-11-2012) | ||
| Sponsored Links | |
|
|
#7
|
|||
|
|||
|
Quote:
I am in one of those situations where the choice is between slogging through an inefficient process, or taking allot of time to set up the correct process. I don't know SQL at all, so I am reluctant to spend days getting all of that set up when the sort function is the only thing that I can't automate. Of course, after taking the time to learn and set it up, I would have learned more tools, and that is not a small thing in any way. Eventually I will do all of the steps I am doing now with SQL queries out of SQLite, using ruby scripts to populate the database (and eventually a browser interface). For now, someone is waiting on the results for this and I need to get it done as quickly as possible. Even if I have to hard code modify a separate script for each sorting criteria, that will still be much faster than all the excel, especially since I have to modify the resulting excel file. I will have to do this process again with other data, so once the scripts are set up, I should be able to automate the entire process. I was thinking that if I was going to set up a sort script, I might as well try to make is a general a tool as possible so that it could be useful for other things. It does appear that there is allot of possible variation in how the script would need to operate in different cases, so perhaps my thinking was not realistic. Never the less, it would be a significant help with my current project and would not need to be generalized for that purpose. LMHmedchem |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| sort script | G30 | UNIX for Dummies Questions & Answers | 2 | 11-30-2011 02:44 PM |
| Script to sort the files and append the extension .sort to the sorted version of the file | pankaj80 | UNIX for Advanced & Expert Users | 3 | 06-07-2011 09:28 AM |
| need Unix script to sort | p_satyambabu | Shell Programming and Scripting | 0 | 05-07-2010 05:54 AM |
| Using sort with awk script | Trellot | Shell Programming and Scripting | 9 | 12-14-2007 01:27 AM |
| Script to sort data | wizardy_maximus | Shell Programming and Scripting | 1 | 11-21-2007 03:30 AM |
|
|