If you wanted to sort a .csv file that was filled with lines like this:
<Ticker>,<Date as YYYYMMDD>,<Time as H:M:S>,<Volume>,<Corr>
(H : [1, 23], M, S: [0, 59])
by date, does anybody know of a better solution than to turn the 3rd and 4th colons of every line into commas, sorting on four keys, and then turning those two commas in every line back to colons? It seems very inefficient to me. (I would just do it and not bother asking if these files weren't 50+GB.)
---------- Post updated at 09:43 PM ---------- Previous update was at 09:27 PM ----------
If you wanted to sort a .csv file that was filled with lines like this:
<Ticker>,<Date as YYYYMMDD>,<Time as H:M:S>,<Volume>,<Corr>
(H : [1, 23], M, S: [0, 59])
by date, does anybody know of a better solution than to turn the 3rd and 4th colons of every line into commas, sorting on four keys, and then turning those two commas in every line back to colons? It seems very inefficient to me. (I would just do it and not bother asking if these files weren't 50+GB.)
---------- Post updated at 09:43 PM ---------- Previous update was at 09:27 PM ----------
Meh, I'll let it run overnight.
Are hours, minutes and seconds all zero padded? For example, 01:02:03 instead of 1:2:3 or 1:02:03? If so, you do not need to modify anything. You can use the default lexicographical sort with the date and time fields as the keys.
Also, you mentioned that hours range betwee 1-23. In case it's relevant, that's only a 23 hour day.
If the source file is 50+ GB, you are going to need a lot of ram. You'll probably need to split the file into smaller chunks, sort them individually, and then merge them with sort -m.
Are hours, minutes and seconds all zero padded? For example, 01:02:03 instead of 1:2:3 or 1:02:03? If so, you do not need to modify anything. You can use the default lexicographical sort with the date and time fields as the keys.
Also, you mentioned that hours range betwee 1-23. In case it's relevant, that's only a 23 hour day.
If the source file is 50+ GB, you are going to need a lot of ram. You'll probably need to split the file into smaller chunks, sort them individually, and then merge them with sort -m.
Regards,
Alister
Oddly the hours aren't zero padded but the minutes and seconds are. (I think it's like [1]?[0-9]:[0-5][0-9]:[0-5][0-9] in Regex-speak.)
I'm going to try to figure out how to split it up and then attempt sorting again -- thanks.
Last edited by Ryan.; 07-11-2011 at 04:10 PM..
Reason: Wrong
By the way, the consecutive seds in the pipeline can be simplified: sed 's/,/:/3 ; s/,/:/3'.
That'll save some time in context switches and copying data in and out of kernel/userland buffers.
Regards,
Alister
---------- Post updated at 02:55 PM ---------- Previous update was at 02:53 PM ----------
Also, it seems GNU sort can handle this situation, by automatically creating tmp files during the sorting process. I'm assuming you're not on Linux. If so, and if you are using GNU sort, you should paste the exact error message.
Hi All,
I have a file with a single row having the following text
ABC.ABC.ABC,Database,New123,DBNAME,F,ABC.ABC.ABC_APP,"@FUNCTION1("ENT1") ,@FUNCTION2("ENT2")",R,
I want an output in the following format
ABC.ABC.ABC DBNAME ABC.ABC.ABC_APP '@FUNCTION1("ENT1")... (3 Replies)
HI All,
How to append the multiple delimiters to at end the file up to 69 fields.
FinalDelimiter Count is 69
recrod Delimeter count is 10
so 69-10=59
this script will add upto 59 Delimiters to that records.
this script will check each and every record in a file and append the delimiters... (4 Replies)
Hello all
I have a data base of information that is formatted like so:
JSD4863 XXX-XX-XXXX DOE, JOHN C JR-II BISS CPSC BS INFO TECH 412/779-9445
I need the last four digits of the phone number. However, many lines contain
'garbage data' that I'm not interested in. So i used a 'for loop'... (7 Replies)
Hi Folks,
This is the first time I ever encountered this situation
My input file is of this kind
cat input.txt
1 PAIXAF 0 1 1 -9 0 0 0 1 2 0 2 1 2 1
7 PAIXEM 0 7 1 -9 1 0 2 0 1 2 2 1 0 2
9 PAKZXY 0 2 1 -9 2 0 1 1 1 0 1 2 0 1
Till the sixth column (which is -9), I want my columns to... (4 Replies)
Hi, I need to display the last column value in the below o/p.
sam2 PS 03/10/11 0 441
Unable to get o/p with this awk code
awk -F"+" '{ print $4 }' pwdchk.txt
I need to display 441(in this eg.) and also accept it as a variable to treat it with if condition and take a decision.... (1 Reply)
Hello,
I have data where words are separated by a delimiter. In this case "="
The number of delimiters in a line can vary from 4to 8. The norm is 4.
Is it possible to have a script where the file could be separated starting with highest number of delimiters and ending with the lowest
An... (8 Replies)
I have the following string sample:
bla bla bla bla bla
I would like to extract the "123" using awk.
I thought about awk -F"]" '{ print $1 }' but it doesn't work
Any ideas ? (7 Replies)
Hi All
I have recently had to start using Unix for work and I have hit brick wall with this prob....
I have a file that goes a little something like this....
EUR;EUR;EUR:USD:USD;USD;;;EUR/USD;XAU/AUD;XAU/EUR;XAU/AUD,GBP/BOB,UAD/XAU;;;1.11;2.22;3.33;4.44;5.55;6.66;;;
is it possible to... (7 Replies)
Data I want to sort :-
1 10 jj Y
2 100 vv B
19 5 jj A
1 11 hq D
3 8 op X
44 78 ds GG
1 8 hq D
and want to sort based on the first 2 columns - which hold numeric values.
Am using :
cat filename | sort -nk 1,2
But the result is :-
1 10 jj Y
1 11 hq D (1 Reply)