sort script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sort script
# 1  
Old 10-10-2012
sort script

I have need of a sorting script that will sort a file of tab delimited data on multiple columns. I often do this kind of thing in excel, but I have need of something more automated. I guess the syntax would be something like, sort by columnName, then by otherColumnName, etc. There are allot of options for this sort of thing, such as the sort order (a-z, z-a), data types, etc, so I'm not sure where to begin. I know that the higher level interpreters like python have nice sort functions, but I don't know python really at all. I guess that for now I would hard code the sort criteria, but it also might be nice to eventually be able to pass that in, assuming that a reasonable syntax could be developed.

Any suggestions as to where to start with this?

LMHmedchem
# 2  
Old 10-10-2012
Take a look at the sort command, you could use parameters as the key fields to sort on.
# 3  
Old 10-10-2012
It looks like I could use the sort command, but I would prefer to not use column numbers, but header names. Since this will be automated to some extent, it will not be readily apparent if the wrong col number was used. I supposed I could search for the header name and determine what the correct col number is, but I'm not sure how to do that either. I'm sure there is an awk command for that, but I don't remember.

LMHmedchem
# 4  
Old 10-10-2012
Unless you will know from the header name how a field should be sorted (forward or reverse, numeric or alphanumeric, etc.), you will probably need some manual intervention. If you just want to use header names instead of field numbers but can otherwise use the sort utility's flags to control the type of sorting to be done, we can probably help you come up with a way to do that if you give us some concrete examples of what your data looks like (including the headers). Please use code tags when you post this information; knowing whether fields are separated by <space>s, <tab>s, <colon>s, <comma>s or something else will be crucial to coming up with a working solution.

You are correct in noting that some versions of awk provide built-in sorting functions, but they are not portable and (as far as I know) limited to increasing order alphanumeric sorting.
# 5  
Old 10-10-2012
Below is a sample of a file I just sorted in excel. It's tab delimited text and has unix EOL if that matters.

This was sorted as follows,
Cpyridne_N
Cphenol_O
fusering
nasC

all sorted in ascending order and all of these are ints. I think that I could use the sort arguments if I had the column numbers, but it would be nice to investigate what the possibilities are. Most of the scripts I start out with require some manual editing until I figure out exactly how I want to use them and get them refined. I am doing this in windows under cygwin in case that makes a difference.

LMHmedchem

Code:
id	IUPACNAME	filePath	IDNUMBER	PG	K-MWT	K-EST	K-CHI	K-KAP	RI3_1	ring	fusering	branch	arom	nasN	nasO	nasP	nasS	fw	nasH	nasC	nasNBIO	nasHet	nasProt	CddsN	CtN	nrings	ncirc	nbranch	CaromC	npc4	nch3	nch4	nch5	nch6	nch7	nch8	nch9	nch10	Calph_N	Calph_NH	Caniline_N	Caniline_H	Cpyridne_N	Cpyrrole_N	Cpyrrole_H	Camide_H	Camide_N	Camide_O	Cprot_N	Calph_O	Calph_OH	Cacid_dO	Cacid_sO	Cacid_H	Cphenol_O	Calph_S	Carom_S	Cacid_CH	Csp3OH	Csp2OH	CKetone	CAldhyd	CFormate	CEster	CAnhyde	CaaO	CEther	COxirane	COxetane	CNoxide	CPNoxide	COxime	CCar	CCarbam	CCarbacd	CGuano	CBenzam	CsSH	CssS	CdS	CdssS	CddssS	CSul	CdCH2	CtCH
1	pyridin-4-ol	ST044793.mol	ST044793	5	9510	13785	4522	1003	164.26	1	0	1	1	1	1	0	0	95.1006	5	5	0	2	1	0	0	1	1	1	5	2	0	0	0	1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	2,8-dimethylquinolin-4-ol	ST4140189.mol	ST4140189	8	17321	18077	11694	1673	298.65	1	1	1	1	1	1	0	0	173.214	11	11	0	2	1	0	0	2	3	5	9	18	0	0	0	2	0	0	0	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	2,6-dimethylquinolin-4-ol	ST090591.mol	ST090591	5	17321	18078	11636	1688	298.637	1	1	1	1	1	1	0	0	173.214	11	11	0	2	1	0	0	2	3	5	9	16	0	0	0	2	0	0	0	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	2,7,8-trimethylquinolin-4-ol	ST090619.mol	ST090619	5	18724	18527	12992	1827	313.572	1	1	1	1	1	1	0	0	187.241	13	12	0	2	1	0	0	2	3	6	9	22	0	0	0	2	0	0	0	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
5	8-ethyl-2-methylquinolin-4-ol	ST085252.mol	ST085252	5	18724	18732	12630	1900	311.272	1	1	1	1	1	1	0	0	187.241	13	12	0	2	1	0	0	2	3	5	9	19	0	0	0	2	0	0	0	1	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
6	4-(10-methyl-8,9,10,11-tetrahydrobenzo[a]acridin-12-yl)phenol	ST075331.mol	ST075331	5	33944	31160	34829	3237	469.926	1	1	1	1	1	1	0	0	339.436	21	24	0	2	1	0	0	5	11	10	19	42	0	0	0	5	0	0	0	3	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
7	4-[6-(4-hydroxyphenyl)-2-pyridyl]phenol	HTS12699.mol	HTS12699	5	26330	30423	18308	2886	330.105	1	0	1	1	1	2	0	0	263.295	13	17	0	3	1	0	0	3	3	6	17	20	0	0	0	3	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
8	4-(10-methyl-8,9,10,11-tetrahydrobenzo[a]acridin-12-yl)benzene-1,3-diol	ST4068500.mol	ST4068500	8	35544	37593	36522	3385	394.257	1	1	1	1	1	2	0	0	355.435	21	24	0	3	1	0	0	5	11	11	19	46	0	0	0	5	0	0	0	3	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9	pyrimidin-2-ol	ST075441.mol	ST075441	5	9609	15301	4651	998	133.621	1	0	1	1	2	1	0	0	96.0884	4	4	0	3	2	0	0	1	1	1	4	2	0	0	0	1	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
10	4-methylpyrimidin-2-ol	ST067104.mol	ST067104	5	11012	15676	5587	1166	190.913	1	0	1	1	2	1	0	0	110.115	6	5	0	3	2	0	0	1	1	2	4	4	0	0	0	1	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
11	6-methyl-2-(methylethyl)pyrimidin-4-ol	ST059600.mol	ST059600	5	15220	17380	8313	1738	245.562	1	0	1	1	2	1	0	0	152.196	12	8	0	3	2	0	0	1	1	4	4	10	0	0	0	1	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
12	4-(3-methylphenyl)pyrimidin-2-ol	ST45073613.mol	ST45073613	8	18621	20654	11588	2019	274.464	1	0	1	1	2	1	0	0	186.213	10	11	0	3	2	0	0	2	2	4	10	12	0	0	0	2	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
13	5,6-diphenylpyrazin-2-ol	ST039025.mol	ST039025	5	24828	25454	17493	2700	316.406	1	0	1	1	2	1	0	0	248.284	12	16	0	3	2	0	0	3	3	5	16	20	0	0	0	3	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
14	2-(4,6-diphenylpyrimidin-2-yl)phenol	ST029297.mol	ST029297	5	32438	31182	25225	3569	391.401	1	0	1	1	2	1	0	0	324.381	16	22	0	3	2	0	0	4	4	7	22	28	0	0	0	4	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
15	2-(2,6-diphenylpyrimidin-4-yl)phenol	ST029295.mol	ST029295	5	32438	31239	25221	3569	390.866	1	0	1	1	2	1	0	0	324.381	16	22	0	3	2	0	0	4	4	7	22	28	0	0	0	4	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
16	4-quinoxalin-2-ylphenol	ST058726.mol	ST058726	5	22225	24069	16171	2241	292.885	1	1	1	1	2	1	0	0	222.246	10	14	0	3	2	0	0	3	4	5	14	18	0	0	0	3	0	0	0	1	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
17	3-(1-phenylpyridino[3,2-f]quinolin-3-yl)phenol	ST090881.mol	ST090881	5	34840	34402	33217	3494	428.414	1	1	1	1	2	1	0	0	348.403	16	24	0	3	2	0	0	5	8	9	24	38	0	0	0	5	0	0	0	2	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
18	6-(4-ethylphenyl)benzo[a]phenazin-5-ol	HTS02737.mol	HTS02737	5	35042	34250	36169	3408	398.524	1	1	1	1	2	1	0	0	350.419	18	24	0	3	2	0	0	5	11	10	22	45	0	0	0	5	0	0	0	3	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
19	pyrimidine-2,4-diol	ST45061548.mol	ST45061548	8	11209	21655	5646	1164	113.724	1	0	1	1	2	2	0	0	112.088	4	4	0	4	2	0	0	1	1	2	4	4	0	0	0	1	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
20	quinoxaline-2,3-diol	ST003022.mol	ST003022	5	16215	25248	10508	1518	208.677	1	1	1	1	2	2	0	0	162.148	6	8	0	4	2	0	0	2	3	4	8	14	0	0	0	2	0	0	0	1	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
21	5,6-dimethylquinoxaline-2,3-diol	HTS06808.mol	HTS06808	5	19020	26134	13025	1821	260.958	1	1	1	1	2	2	0	0	190.201	10	10	0	4	2	0	0	2	3	6	8	22	0	0	0	2	0	0	0	1	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
22	pyridino[3,2-h]quinoline-4,7-diol	HTS08910.mol	HTS08910	5	21221	29008	16776	1918	242.061	1	1	1	1	2	2	0	0	212.207	8	12	0	4	2	0	0	3	6	6	12	26	0	0	0	3	0	0	0	2	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
23	5,8-dimethylquinolino[3,2-c]acridine-6,7-diol	ST059734.mol	ST059734	5	34038	38753	38389	3061	348.484	1	1	1	1	2	2	0	0	340.381	16	22	0	4	2	0	0	5	15	12	20	56	0	0	0	5	0	0	0	4	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
24	4-pyridino[4,3-e]pyrazin-3-ylphenol	ST058761.mol	ST058761	5	22323	25841	16236	2237	260.974	1	1	1	1	3	1	0	0	223.234	9	13	0	4	3	0	0	3	4	5	13	18	0	0	0	3	0	0	0	1	0	0	0	0	3	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
25	pyrazino[2,3-d]pyridazine-5,8-diol	ST091682.mol	ST091682	5	16412	28414	10823	1497	176.671	1	1	1	1	4	2	0	0	164.123	4	6	0	6	4	0	0	2	3	4	6	16	0	0	0	2	0	0	0	1	0	0	0	0	4	0	0	0	0	0	0	0	0	0	0	0	2	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

# 6  
Old 10-11-2012
IMO: your idea lacks merit.

You should simply insert the data into a db with a loader, then produce output. If you have Excel you have MS office and probably a db like Access. UNIX has free db's: mysql, Berekely db, etc.

So, Why does your idea look like reinventing the wheel?

The reason is that Don Cragun is right. You have to have complete metadata for each and every column to be able to sort based on arbitrary column selections. And there is too much metadata to cobble together anything useful in shell. Metadata can change. db's are meant for that contingency.

The way it is now, you have to know what you are sorting by looking at it - in effect gathering metadata. You have to use human intelligence to make decisions. Then use unix sort/Excel sort. Databases are meant to do this. You just tell db's to sort on column names, optionally: ascending or descending for each column. You can write database scripts (SQL language for example) to do stats or a lot of what you already can do in Excel.

So, stay in Excel or move to a db. I vote for a db.
This User Gave Thanks to jim mcnamara For This Post:
# 7  
Old 10-11-2012
Quote:
Originally Posted by jim mcnamara
IMO: your idea lacks merit.

You should simply insert the data into a db with a loader, then produce output. If you have Excel you have MS office and probably a db like Access. UNIX has free db's: mysql, Berekely db, etc.

So, Why does your idea look like reinventing the wheel?

The reason is that Don Cragun is right. You have to have complete metadata for each and every column to be able to sort based on arbitrary column selections. And there is too much metadata to cobble together anything useful in shell. Metadata can change. db's are meant for that contingency.

The way it is now, you have to know what you are sorting by looking at it - in effect gathering metadata. You have to use human intelligence to make decisions. Then use unix sort/Excel sort. Databases are meant to do this. You just tell db's to sort on column names, optionally: ascending or descending for each column. You can write database scripts (SQL language for example) to do stats or a lot of what you already can do in Excel.

So, stay in Excel or move to a db. I vote for a db.
Yes, a database would be nice. I am in the process of working on a database system for some of this data, but there are non-trivial issues with other parts of it not related to this, so I am finishing what I am working on using some scripting tools that I already have. The sorting is the only part of the process that is not already scripted, so I am having to open each file in excel, set up a sort, save the file, and then do some shell to get the txt file back to linux land. All of these sorts are on ints and in ascending order. I have a few hundred to do, so after the first several, I started thinking that there had to be a better way.

I am in one of those situations where the choice is between slogging through an inefficient process, or taking allot of time to set up the correct process. I don't know SQL at all, so I am reluctant to spend days getting all of that set up when the sort function is the only thing that I can't automate. Of course, after taking the time to learn and set it up, I would have learned more tools, and that is not a small thing in any way. Eventually I will do all of the steps I am doing now with SQL queries out of SQLite, using ruby scripts to populate the database (and eventually a browser interface). For now, someone is waiting on the results for this and I need to get it done as quickly as possible. Even if I have to hard code modify a separate script for each sorting criteria, that will still be much faster than all the excel, especially since I have to modify the resulting excel file. I will have to do this process again with other data, so once the scripts are set up, I should be able to automate the entire process.

I was thinking that if I was going to set up a sort script, I might as well try to make is a general a tool as possible so that it could be useful for other things. It does appear that there is allot of possible variation in how the script would need to operate in different cases, so perhaps my thinking was not realistic. Never the less, it would be a significant help with my current project and would not need to be generalized for that purpose.

LMHmedchem
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to call and sort awk script and output

I'm trying to create a shell script that takes a awk script that I wrote and a filename as an argument. I was able to get that done but I'm having trouble figuring out how to keep the header of the output at the top but sort the rest of the rows alphabetically. This is what I have now but it is... (1 Reply)
Discussion started by: Eric7giants
1 Replies

2. Shell Programming and Scripting

Script to Sort into columns

Hi geeks! I want to convert the following: EPC-NotificationData: sms:2348034503643 EPC-GroupIds: 300H:10:22-01-2014T07:30:14,22-04-2014T07:30:14 To: EPC-NotificationData: sms:2348034503643, EPC-GroupIds: 300H:10:22-01-2014T07:30:14,22-04-2014T07:30:14 I want them to be on the same... (13 Replies)
Discussion started by: infinitydon
13 Replies

3. Shell Programming and Scripting

Sort help: How to sort collected 'file list' by date stamp :

Hi Experts, I have a filelist collected from another server , now want to sort the output using date/time stamp filed. - Filed 6, 7,8 are showing the date/time/stamp. Here is the input: #---------------------------------------------------------------------- -rw------- 1 root ... (3 Replies)
Discussion started by: rveri
3 Replies

4. UNIX for Dummies Questions & Answers

sort script

hi guys i looking for someone to help me with a script i want to sort all de files from /bin by size (from max to min) and the size and path of first 3 files to be written in /home/user/bin_size .And i want to put that script in crontab to execute every monday at 20:00 Can someone help me... (2 Replies)
Discussion started by: G30
2 Replies

5. UNIX for Advanced & Expert Users

Script to sort the files and append the extension .sort to the sorted version of the file

Hello all - I am to this forum and fairly new in learning unix and finding some difficulty in preparing a small shell script. I am trying to make script to sort all the files given by user as input (either the exact full name of the file or say the files matching the criteria like all files... (3 Replies)
Discussion started by: pankaj80
3 Replies

6. Shell Programming and Scripting

need Unix script to sort

Hi i have a file like this oprvdw vrc002093j.ksh oprvdw vrc002092j.ksh oprvrc vrc045016j.ksh oprvrc vrc055141j.ksh svemietl bdw0231185.sh svemietl bdw0231145.sh and i need a script which dispalys in below format: oprvdw : vrc002093j.ksh vrc002092j.ksh oprvrc :... (0 Replies)
Discussion started by: p_satyambabu
0 Replies

7. Shell Programming and Scripting

Using sort with awk script

I have a file with four fields and an awk script that strips out one field displaying the remaining three. I have added headings for each of these fields such as Player - Year - RBIs then below it comes the data. What I am trying to do is sort the RBIs field in my script from most to least at the... (9 Replies)
Discussion started by: Trellot
9 Replies

8. Shell Programming and Scripting

Script to sort data

Hi All, I have a .csv file with 3 columns called nLats, nLongs, and fRes. in following format : "nLats","nLongs","fRes" 0,0,-1 0,1,-1 0,2,-1 0,3,-1 0,4,-1 ......... ......... 0,143,-1 nLats increments at nLongs=143 1,0, -1 1,1, -1 .......... .......... 1,143,-1... (1 Reply)
Discussion started by: wizardy_maximus
1 Replies

9. UNIX for Dummies Questions & Answers

sort script/command

ok. i am doing a project where i have hand typed in the titles of nearly 500 DVD titles, each one is on a seperate line. but they arent in any type of alphebetical order, and i need them sorted in that format (A-Z or a-z) ..... i know that the 'sort' command can be used but also know the... (6 Replies)
Discussion started by: Chadbot
6 Replies

10. Shell Programming and Scripting

sort utility in script ?

Hi friends, I want to use sort command in script. I used the following syntax in my scipt, sort -t '|' +3 tempcdrext4.cdr > temp.mocdr It give me a error " Input file specified two times." but this command work fine in the prompt without any problem. Can sombody please tell me who... (2 Replies)
Discussion started by: maheshsri
2 Replies
Login or Register to Ask a Question