What about sorting a 5G file?


 
Thread Tools Search this Thread
Operating Systems Solaris What about sorting a 5G file?
# 1  
Old 12-02-2011
Computer What about sorting a 5G file?

Hi Guys,

My client (dear clients, I hate to love you) has the funky idea of sorting a 5G flat file.

Certainly enough, this is taking forever and also fulls the / of our machine.

Any idea of how we could proceed to make this a little bit more efficient?

Maybe by forcing sort to "stay in memory" instead of writing to disk?

Any ideas?

I'm kinda clueless for the moment ...

THANKS!
# 2  
Old 12-02-2011
"sort" is certainly able to sort a 5-gigabyte flat file. It'll take a while but it has no hard limits except disk space for temporary files(as you've discovered!)

You can tell it to use more memory with -S. It expects a number in kilobytes. 5000 would be about 5 megabytes, for example. Don't tell it to use more RAM than you have free (or cached; cache counts as free), because that will actually make it slower once the system runs out of memory and begins swapping. If you're on a 32-bit system, you should only tell it to use a gigabyte or two at the very most, because of 32-bit address size limitations.
Code:
sort -S 5000000 file_to_sort > file_output

You can prevent it from filling up / by making it store temporary files somewhere else. If you give it a folder on a different disk than the file being sorted, this may increase performance, too, by having more bandwidth available on two disks than one.
Code:
sort -T /path/to/tempdir file_to_sort > file_output

All of these flags can be found in man sort
# 3  
Old 12-02-2011
Hi Corona,

Thanks for your answer. Before posting I always read the mans, I ask quesitons when I'm looking for a different way to make things and to try to understand a little bit the mechanisms behind some of the bins.

By looking at the process while it was doing the sort I noticed that it divides the file in multiple chunks (11megs in my case) then makes a new file out of the 64 first "chunks" it created and delete those first smaller "chunks". This will then generate 9 files (approximately) then generate a finale one from those last 9 files. In other words, we need approximately 1.50 to 1.75 x File_size of disk space to be able to complete that sort.

Because our swap device is a disk, I don't think it will actually slow down the process if I tell it to do something like :
Code:
sort -S 80% file > destination

So it'll be as fast as just writing to the disk like it is currently doing. Yes, it'll slow other processes but we don't really care at the moment because this job is necessary for any other job to start.

I realize that the question I asked is not really the question I wanted to ask. By asking a more efficient way, I was more thinking "a fastest way".

Any idea on how to "split" or "fork" sort on multiple cores? We have 8 available on the oldish v490 and I was hoping to use maybe 4 to 6 cores to do it.

If not, I guess that redirecting the sort to our SAN storage will be the most efficient way.

Thanks again Corona for your time, really appreciated.
# 4  
Old 12-02-2011
Quote:
Originally Posted by plmachiavel
Because our swap device is a disk, I don't think it will actually slow down the process if I tell it to do something like :
Code:
sort -S 80% file > destination

So it'll be as fast as just writing to the disk like it is currently doing. Yes, it'll slow other processes but we don't really care at the moment because this job is necessary for any other job to start.

I realize that the question I asked is not really the question I wanted to ask. By asking a more efficient way, I was more thinking "a fastest way".
That your swap's a separate disk means it's not as bad when it swaps, but it's still pretty bad.
Quote:
Any idea on how to "split" or "fork" sort on multiple cores? We have 8 available on the oldish v490 and I was hoping to use maybe 4 to 6 cores to do it.
You do that by splitting the file into parts to sort separately, then merging to create the final result.
# 5  
Old 12-02-2011
Have you considered loading this data into a DB, and using SQL to select the records in a sorted way, then streaming it out to a file with identical structure as your original file?
# 6  
Old 12-02-2011
If he's running out of disk space when sorting, I doubt he'll have space for data + metadata + index.
# 7  
Old 12-02-2011
Sure it's a valid point that if disk space max is being reached during sort, the same issue may be encountered while inserting into a DB. However, we don't really know all the details of the server. Who knows; maybe there is some additional mounted volume, or some external DB available (not on the same disk) that could be used.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need help in Sorting a file

Hi Unix Admins, I wanted to sort a file in a specific order, i.e the input file contains two fields and the first column is not unique and had to be sorted. example Input File ------- 2014-10-21:Rand1 2014-11-02:Rand2 2014-11-02:Rand3 2014-11-02:Rand4 2014-11-03:Rand5 2014-11-04:Rand6... (4 Replies)
Discussion started by: Naveenezone
4 Replies

2. UNIX for Dummies Questions & Answers

Sorting data in file based on field in another file

Hi, I have two files, one of which I would like to sort based on the order of the data in the second. I would like to do this using a simple unix statement. My two files as follows: File 1: 12345 1 2 2 2 0 0 12349 0 0 2 2 1 2 12350 1 2 1 2 2 2 . . . File2: 12350... (3 Replies)
Discussion started by: kasan0
3 Replies

3. UNIX for Dummies Questions & Answers

sorting s file

how would i sort a file on the third column based on numerical value instead of the ASCII order? (1 Reply)
Discussion started by: trob
1 Replies

4. Shell Programming and Scripting

Finding & Moving Oldest File by Parsing/Sorting Date Info in File Names

I'm trying to write a script that will look in an /exports folder for the oldest export file and move it to a /staging folder. "Oldest" in this case is actually determined by date information embedded in the file names themselves. Also, the script should only move a file from /exports to... (6 Replies)
Discussion started by: nikosey
6 Replies

5. Shell Programming and Scripting

Sorting file

I have the file as follow: A: 60 B: 80 C: 40 D: 11 E: 100 I want to sort the file and get the output to file as follow: E: 100 B: 80 A: 60 C: 40 D: 11 Could any one help me please? (1 Reply)
Discussion started by: moutaz1983
1 Replies

6. Shell Programming and Scripting

file sorting

i have a data in afile like this ************************************** sree sree@yahoo.com 98662323432 ************************************* phani phani@yahoo.com 98662323344 ************************************* i want to sort the file with respect to name. how can i do this. thank... (5 Replies)
Discussion started by: phani_sree
5 Replies

7. Programming

regarding file sorting

i ahve a file like: ************************************* sree 122132 12321 *********************************** phani 21321 3213214 ****************************** dddsds 213213123 23213213 ******************************* i want to sort the file with respect to name how we can do this... (1 Reply)
Discussion started by: phani_sree
1 Replies

8. Shell Programming and Scripting

Sorting a file

hello all, I have a file with two numbers on each line, comma separated. I want to sort the contents of the file in increasing order (smallest to largest) of the numbers on the second line. i.e: 23,3 25,2 27,12 to become 25,2 23,3 27,12 Does anyone know how I can do this? Thanks... (2 Replies)
Discussion started by: Khoomfire
2 Replies

9. Shell Programming and Scripting

sorting file

hi everyone, i have a document where i have email addresess and names, i need to check if the email addresses are uniq, if they repeat erase one of them, how can i do that? document sample: aD00763357@cucei.udg.mx,ABRAHAM ANTONIO SEVERIANO a199721111@cucei.udg.mx,ABRAHAM GONZALEZ... (4 Replies)
Discussion started by: sx3v1l_1n51de
4 Replies

10. Shell Programming and Scripting

Help sorting file.

Hi, I have this file (filex) 07-11-2003 10:11:12!cccc!ddd!eeeeeeee 07-11-2003 09:11:11!dddd!kkkkk!xxxxxx 09-12-2003 14:18:43!aaaa!bbbbb!cccc where I need to sort it by date+time in this order: 09-12-2003 14:18:43!aaaa!bbbbb!cccc 07-11-2003 10:11:12!cccc!ddd!eeeeeeee 07-11-2003... (3 Replies)
Discussion started by: gio123bg
3 Replies
Login or Register to Ask a Question