"sort" is certainly able to sort a 5-gigabyte flat file. It'll take a while but it has no hard limits except disk space for temporary files(as you've discovered!)
You can tell it to use more memory with -S. It expects a number in kilobytes. 5000 would be about 5 megabytes, for example. Don't tell it to use more RAM than you have free (or cached; cache counts as free), because that will actually make it slower once the system runs out of memory and begins swapping. If you're on a 32-bit system, you should only tell it to use a gigabyte or two at the very most, because of 32-bit address size limitations.
You can prevent it from filling up / by making it store temporary files somewhere else. If you give it a folder on a different disk than the file being sorted, this may increase performance, too, by having more bandwidth available on two disks than one.
All of these flags can be found in man sort
Thanks for your answer. Before posting I always read the mans, I ask quesitons when I'm looking for a different way to make things and to try to understand a little bit the mechanisms behind some of the bins.
By looking at the process while it was doing the sort I noticed that it divides the file in multiple chunks (11megs in my case) then makes a new file out of the 64 first "chunks" it created and delete those first smaller "chunks". This will then generate 9 files (approximately) then generate a finale one from those last 9 files. In other words, we need approximately 1.50 to 1.75 x File_size of disk space to be able to complete that sort.
Because our swap device is a disk, I don't think it will actually slow down the process if I tell it to do something like :
So it'll be as fast as just writing to the disk like it is currently doing. Yes, it'll slow other processes but we don't really care at the moment because this job is necessary for any other job to start.
I realize that the question I asked is not really the question I wanted to ask. By asking a more efficient way, I was more thinking "a fastest way".
Any idea on how to "split" or "fork" sort on multiple cores? We have 8 available on the oldish v490 and I was hoping to use maybe 4 to 6 cores to do it.
If not, I guess that redirecting the sort to our SAN storage will be the most efficient way.
Thanks again Corona for your time, really appreciated.
Because our swap device is a disk, I don't think it will actually slow down the process if I tell it to do something like :
So it'll be as fast as just writing to the disk like it is currently doing. Yes, it'll slow other processes but we don't really care at the moment because this job is necessary for any other job to start.
I realize that the question I asked is not really the question I wanted to ask. By asking a more efficient way, I was more thinking "a fastest way".
That your swap's a separate disk means it's not as bad when it swaps, but it's still pretty bad.
Quote:
Any idea on how to "split" or "fork" sort on multiple cores? We have 8 available on the oldish v490 and I was hoping to use maybe 4 to 6 cores to do it.
You do that by splitting the file into parts to sort separately, then merging to create the final result.
Have you considered loading this data into a DB, and using SQL to select the records in a sorted way, then streaming it out to a file with identical structure as your original file?
Sure it's a valid point that if disk space max is being reached during sort, the same issue may be encountered while inserting into a DB. However, we don't really know all the details of the server. Who knows; maybe there is some additional mounted volume, or some external DB available (not on the same disk) that could be used.
Hi Unix Admins,
I wanted to sort a file in a specific order,
i.e the input file contains two fields and the first column is not unique and had to be sorted. example
Input File
-------
2014-10-21:Rand1
2014-11-02:Rand2
2014-11-02:Rand3
2014-11-02:Rand4
2014-11-03:Rand5
2014-11-04:Rand6... (4 Replies)
Hi,
I have two files, one of which I would like to sort based on the order of the data in the second. I would like to do this using a simple unix statement.
My two files as follows:
File 1:
12345 1 2 2 2 0 0
12349 0 0 2 2 1 2
12350 1 2 1 2 2 2
.
.
.
File2:
12350... (3 Replies)
I'm trying to write a script that will look in an /exports folder for the oldest export file and move it to a /staging folder. "Oldest" in this case is actually determined by date information embedded in the file names themselves.
Also, the script should only move a file from /exports to... (6 Replies)
I have the file as follow:
A: 60
B: 80
C: 40
D: 11
E: 100
I want to sort the file and get the output to file as follow:
E: 100
B: 80
A: 60
C: 40
D: 11
Could any one help me please? (1 Reply)
i have a data in afile like this
**************************************
sree
sree@yahoo.com
98662323432
*************************************
phani
phani@yahoo.com
98662323344
*************************************
i want to sort the file with respect to name.
how can i do this.
thank... (5 Replies)
i ahve a file like:
*************************************
sree
122132
12321
***********************************
phani
21321
3213214
******************************
dddsds
213213123
23213213
*******************************
i want to sort the file with respect to name how we can do this... (1 Reply)
hello all, I have a file with two numbers on each line, comma separated. I want to sort the contents of the file in increasing order (smallest to largest) of the numbers on the second line. i.e:
23,3
25,2
27,12
to become
25,2
23,3
27,12
Does anyone know how I can do this?
Thanks... (2 Replies)
hi everyone, i have a document where i have email addresess and names, i need to check if the email addresses are uniq, if they repeat erase one of them, how can i do that?
document sample:
aD00763357@cucei.udg.mx,ABRAHAM ANTONIO SEVERIANO
a199721111@cucei.udg.mx,ABRAHAM GONZALEZ... (4 Replies)
Hi,
I have this file (filex)
07-11-2003 10:11:12!cccc!ddd!eeeeeeee
07-11-2003 09:11:11!dddd!kkkkk!xxxxxx
09-12-2003 14:18:43!aaaa!bbbbb!cccc
where I need to sort it by date+time in this order:
09-12-2003 14:18:43!aaaa!bbbbb!cccc
07-11-2003 10:11:12!cccc!ddd!eeeeeeee
07-11-2003... (3 Replies)