Sorting problem: Multiple delimiters, multiple keys


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sorting problem: Multiple delimiters, multiple keys
# 8  
Old 07-11-2011
Post a few lines of your input file so it is apparent what you are talking about...also try doing it all in a single command be it sort awk perl...in order to minimize the inefficiency due to process forking.

And for grins how about...
Code:
sort -t, -k2,2n -k3,3 file

After reading your post again i realise it wont work as the hrs. field isnt zero padded but this should...
Code:
sort -t, -k2,2n -k3,3n file


Last edited by shamrock; 07-11-2011 at 04:48 PM..
# 9  
Old 07-11-2011
This is what happened when I tried to use split:

Code:
split: output file suffixes exhausted

Going to try again with bigger splits.

Current file (example):
>> cat 2009_Trades.csv | head -3

SYMBOL,DATE,TIME,PRICE,SIZE,CORR
AAPL,20090102,7:30:01,84.00,230,0
AAPL,20090102,7:30:02,84.01,270,0
(I changed the prices since I have practically no rights to this data.)

There are more symbols but you don't see any others until X MB into the file because it is currently sorted by symbol, then date, then time.

Goal file (example):
>> cat NEW_2009_Trades.csv | head -6

Id,Symbol,TradeTime,Price,Shares,Corr
1,CSCO,2009-01-02T07:30:00,16.96,580,0
2,GOOG,2009-01-02T07:30:00,321.23,200,0
3,AAPL,2009-01-02T07:30:01,90.75,720,0
4,IBM,2009-01-02T07:30:01,87.37,200,0
5,AMZN,2009-01-02T07:30:02,54.01,90,0

Last edited by Ryan.; 07-11-2011 at 05:31 PM..
# 10  
Old 07-11-2011
Quote:
Originally Posted by shamrock
Post a few lines of your input file so it is apparent what you are talking about...also try doing it all in a single command be it sort awk perl...in order to minimize the inefficiency due to process forking.

And for grins how about...
Code:
sort -t, -k2,2n -k3,3 file

After reading your post again i realise it wont work as the hrs. field isnt zero padded but this should...
Code:
sort -t, -k2,2n -k3,3n file

Process forking isn't an issue. Only a few process are created by the pipeline. Perhaps you meant the back and forth context switching between the few processes which constitute the pipeline.

Neither of your suggestions is appropriate, though. The numeric sort will only look at a leading numeric string. This means that sort will never look beyond the first colon in the time string.

Regards,
Alister

---------- Post updated at 04:02 PM ---------- Previous update was at 03:57 PM ----------

Quote:
Originally Posted by Ryan.
This is what happened when I tried to use split:

Code:
split: output file suffixes exhausted

I'm going to try using numeric suffixes and try again.
Make sure you adjust the suffix length using -a so that the number of permutations can accomodate the number of expected files.

Regards,
Alister
# 11  
Old 07-11-2011
Wait.

Wow.

How am I supposed to "sort individually"?

If I split the file up, every time two "sorted" files are combined I still need to sort the merged file, and therefore I run into the same problem.
# 12  
Old 07-11-2011
Quote:
Originally Posted by Ryan.
If I split the file up, every time two "sorted" files are combined I still need to sort the merged file, and therefore I run into the same problem.
No you don't. During the sorting step, the entire file's contents are in use. During the merging step, only one line per file being merged needs to be in memory.

Think about it. If you know that two files are already sorted, you only need to compare two lines at a time, make a decision which comes first, print the correct line, read the line that follows that which was printed, rinse and repeat.

Whereas when a file is not sorted, you do not know where a line goes until you've read the entire file at least once.

Regards,
Alister
This User Gave Thanks to alister For This Post:
# 13  
Old 07-11-2011
Quote:
Originally Posted by alister
No you don't. During the sorting step, the entire file's contents are in use. During the merging step, only one line per file being merged needs to be in memory.

Think about it. If you know that two files are already sorted, you only need to compare two lines at a time, make a decision which comes first, print the correct line, read the line that follows that which was printed, rinse and repeat.

Whereas when a file is not sorted, you do not know where the next line goes until you've read the entire file at least once.

Regards,
Alister
I totally misunderstood you the first time.

So, basically try to split the file by each ticker, and then write some simple code to do my own sorting, correct?

Edit: I guess that all could have been summed up in two words: "Insertion sort"

Last edited by Ryan.; 07-11-2011 at 07:18 PM..
# 14  
Old 07-11-2011
Quote:
Originally Posted by Ryan.
I totally misunderstood you the first time.

So, basically try to split the file by each ticker, and then write some simple code to do my own sorting, correct?

Edit: I guess that all could have been summed up in two words: "Insertion sort"
No. I was not describing a specific sorting algorithm (insertion, quicksort, etc...) but an approach which allows one to deal with more data than memory alone allows. External sorting - Wikipedia, the free encyclopedia

As I said earlier, GNU sort should do this external sort for you (you have yet to make it clear which platform you're working with). It checks the size of the file, checks how much memory the system has available, sees it's much too big, and decides to use temp files to store sorted chunks for subsequent merging.

Whatever sort utility you're using, I'm assuming it's doing this since your error message mentions a temp file.

Quote:
Originally Posted by Ryan.
Code:
read failed: /tmp/sortOgLpWg: Input/ouput error

Perhaps someone familiar with your operating system can give more specific advice with that read i/o error.

Is it possible that your /tmp ran out of space during the sort? That something cleared /tmp while the sort was running? That the hardware is having issues?

It would also be helpful to know the specs of your hardware (ram, available space on relevant filesystems, and such).

Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Getting fields from a file having multiple delimiters

Hi All, I have a file with a single row having the following text ABC.ABC.ABC,Database,New123,DBNAME,F,ABC.ABC.ABC_APP,"@FUNCTION1("ENT1") ,@FUNCTION2("ENT2")",R, I want an output in the following format ABC.ABC.ABC DBNAME ABC.ABC.ABC_APP '@FUNCTION1("ENT1")... (3 Replies)
Discussion started by: dev.devil.1983
3 Replies

2. UNIX for Beginners Questions & Answers

How to append the multiple Delimiters up to requirement?

HI All, How to append the multiple delimiters to at end the file up to 69 fields. FinalDelimiter Count is 69 recrod Delimeter count is 10 so 69-10=59 this script will add upto 59 Delimiters to that records. this script will check each and every record in a file and append the delimiters... (4 Replies)
Discussion started by: vinod.peddiredd
4 Replies

3. Shell Programming and Scripting

Editing phone number with multiple delimiters

Hello all I have a data base of information that is formatted like so: JSD4863 XXX-XX-XXXX DOE, JOHN C JR-II BISS CPSC BS INFO TECH 412/779-9445 I need the last four digits of the phone number. However, many lines contain 'garbage data' that I'm not interested in. So i used a 'for loop'... (7 Replies)
Discussion started by: smartSometimes
7 Replies

4. Shell Programming and Scripting

awk multiple delimiters

Hi Folks, This is the first time I ever encountered this situation My input file is of this kind cat input.txt 1 PAIXAF 0 1 1 -9 0 0 0 1 2 0 2 1 2 1 7 PAIXEM 0 7 1 -9 1 0 2 0 1 2 2 1 0 2 9 PAKZXY 0 2 1 -9 2 0 1 1 1 0 1 2 0 1 Till the sixth column (which is -9), I want my columns to... (4 Replies)
Discussion started by: jacobs.smith
4 Replies

5. Shell Programming and Scripting

treating multiple delimiters[solved]

Hi, I need to display the last column value in the below o/p. sam2 PS 03/10/11 0 441 Unable to get o/p with this awk code awk -F"+" '{ print $4 }' pwdchk.txt I need to display 441(in this eg.) and also accept it as a variable to treat it with if condition and take a decision.... (1 Reply)
Discussion started by: sam_bd
1 Replies

6. Shell Programming and Scripting

Sorting based on multiple delimiters

Hello, I have data where words are separated by a delimiter. In this case "=" The number of delimiters in a line can vary from 4to 8. The norm is 4. Is it possible to have a script where the file could be separated starting with highest number of delimiters and ending with the lowest An... (8 Replies)
Discussion started by: gimley
8 Replies

7. Shell Programming and Scripting

AWK with multiple delimiters

I have the following string sample: bla bla bla bla bla I would like to extract the "123" using awk. I thought about awk -F"]" '{ print $1 }' but it doesn't work Any ideas ? (7 Replies)
Discussion started by: gdub
7 Replies

8. Shell Programming and Scripting

Cutting a file with multiple delimiters into columns

Hi All I have recently had to start using Unix for work and I have hit brick wall with this prob.... I have a file that goes a little something like this.... EUR;EUR;EUR:USD:USD;USD;;;EUR/USD;XAU/AUD;XAU/EUR;XAU/AUD,GBP/BOB,UAD/XAU;;;1.11;2.22;3.33;4.44;5.55;6.66;;; is it possible to... (7 Replies)
Discussion started by: luckycharm
7 Replies

9. Shell Programming and Scripting

Sorting with multiple numeric keys

Data I want to sort :- 1 10 jj Y 2 100 vv B 19 5 jj A 1 11 hq D 3 8 op X 44 78 ds GG 1 8 hq D and want to sort based on the first 2 columns - which hold numeric values. Am using : cat filename | sort -nk 1,2 But the result is :- 1 10 jj Y 1 11 hq D (1 Reply)
Discussion started by: sinpeak
1 Replies

10. Shell Programming and Scripting

awk - treat multiple delimiters as one

Is there anyway to get awk to treat multiple delimiters as one? Particularly spaces... (6 Replies)
Discussion started by: peter.herlihy
6 Replies
Login or Register to Ask a Question