Sponsored Content
Top Forums Shell Programming and Scripting Sorting problem: Multiple delimiters, multiple keys Post 302538042 by alister on Monday 11th of July 2011 02:42:39 PM
Old 07-11-2011
Quote:
Originally Posted by Ryan.
Hello

If you wanted to sort a .csv file that was filled with lines like this:

<Ticker>,<Date as YYYYMMDD>,<Time as H:M:S>,<Volume>,<Corr>

(H : [1, 23], M, S: [0, 59])

by date, does anybody know of a better solution than to turn the 3rd and 4th colons of every line into commas, sorting on four keys, and then turning those two commas in every line back to colons? It seems very inefficient to me. (I would just do it and not bother asking if these files weren't 50+GB.)

---------- Post updated at 09:43 PM ---------- Previous update was at 09:27 PM ----------

Meh, I'll let it run overnight.

Code:
sed 's/:/,/g' big_file.csv | sort -k 2,2 -k 3,3 -k 4,4 -k 5,5 -t',' | sed 's/,/:/3' | sed 's/,/:/3' > big_file.sorted.csv

Are hours, minutes and seconds all zero padded? For example, 01:02:03 instead of 1:2:3 or 1:02:03? If so, you do not need to modify anything. You can use the default lexicographical sort with the date and time fields as the keys.

Also, you mentioned that hours range betwee 1-23. In case it's relevant, that's only a 23 hour day.

If the source file is 50+ GB, you are going to need a lot of ram. You'll probably need to split the file into smaller chunks, sort them individually, and then merge them with sort -m.

Regards,
Alister
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk - treat multiple delimiters as one

Is there anyway to get awk to treat multiple delimiters as one? Particularly spaces... (6 Replies)
Discussion started by: peter.herlihy
6 Replies

2. Shell Programming and Scripting

Sorting with multiple numeric keys

Data I want to sort :- 1 10 jj Y 2 100 vv B 19 5 jj A 1 11 hq D 3 8 op X 44 78 ds GG 1 8 hq D and want to sort based on the first 2 columns - which hold numeric values. Am using : cat filename | sort -nk 1,2 But the result is :- 1 10 jj Y 1 11 hq D (1 Reply)
Discussion started by: sinpeak
1 Replies

3. Shell Programming and Scripting

Cutting a file with multiple delimiters into columns

Hi All I have recently had to start using Unix for work and I have hit brick wall with this prob.... I have a file that goes a little something like this.... EUR;EUR;EUR:USD:USD;USD;;;EUR/USD;XAU/AUD;XAU/EUR;XAU/AUD,GBP/BOB,UAD/XAU;;;1.11;2.22;3.33;4.44;5.55;6.66;;; is it possible to... (7 Replies)
Discussion started by: luckycharm
7 Replies

4. Shell Programming and Scripting

AWK with multiple delimiters

I have the following string sample: bla bla bla bla bla I would like to extract the "123" using awk. I thought about awk -F"]" '{ print $1 }' but it doesn't work Any ideas ? (7 Replies)
Discussion started by: gdub
7 Replies

5. Shell Programming and Scripting

Sorting based on multiple delimiters

Hello, I have data where words are separated by a delimiter. In this case "=" The number of delimiters in a line can vary from 4to 8. The norm is 4. Is it possible to have a script where the file could be separated starting with highest number of delimiters and ending with the lowest An... (8 Replies)
Discussion started by: gimley
8 Replies

6. Shell Programming and Scripting

treating multiple delimiters[solved]

Hi, I need to display the last column value in the below o/p. sam2 PS 03/10/11 0 441 Unable to get o/p with this awk code awk -F"+" '{ print $4 }' pwdchk.txt I need to display 441(in this eg.) and also accept it as a variable to treat it with if condition and take a decision.... (1 Reply)
Discussion started by: sam_bd
1 Replies

7. Shell Programming and Scripting

awk multiple delimiters

Hi Folks, This is the first time I ever encountered this situation My input file is of this kind cat input.txt 1 PAIXAF 0 1 1 -9 0 0 0 1 2 0 2 1 2 1 7 PAIXEM 0 7 1 -9 1 0 2 0 1 2 2 1 0 2 9 PAKZXY 0 2 1 -9 2 0 1 1 1 0 1 2 0 1 Till the sixth column (which is -9), I want my columns to... (4 Replies)
Discussion started by: jacobs.smith
4 Replies

8. Shell Programming and Scripting

Editing phone number with multiple delimiters

Hello all I have a data base of information that is formatted like so: JSD4863 XXX-XX-XXXX DOE, JOHN C JR-II BISS CPSC BS INFO TECH 412/779-9445 I need the last four digits of the phone number. However, many lines contain 'garbage data' that I'm not interested in. So i used a 'for loop'... (7 Replies)
Discussion started by: smartSometimes
7 Replies

9. UNIX for Beginners Questions & Answers

How to append the multiple Delimiters up to requirement?

HI All, How to append the multiple delimiters to at end the file up to 69 fields. FinalDelimiter Count is 69 recrod Delimeter count is 10 so 69-10=59 this script will add upto 59 Delimiters to that records. this script will check each and every record in a file and append the delimiters... (4 Replies)
Discussion started by: vinod.peddiredd
4 Replies

10. Shell Programming and Scripting

Getting fields from a file having multiple delimiters

Hi All, I have a file with a single row having the following text ABC.ABC.ABC,Database,New123,DBNAME,F,ABC.ABC.ABC_APP,"@FUNCTION1("ENT1") ,@FUNCTION2("ENT2")",R, I want an output in the following format ABC.ABC.ABC DBNAME ABC.ABC.ABC_APP '@FUNCTION1("ENT1")... (3 Replies)
Discussion started by: dev.devil.1983
3 Replies
WRAP-AND-SORT(1)					      General Commands Manual						  WRAP-AND-SORT(1)

NAME
wrap-and-sort - wrap long lines and sort items in Debian packaging files SYNOPSIS
wrap-and-sort [options] DESCRIPTION
wrap-and-sort wraps the package lists in Debian control files. By default the lists will only split into multiple lines if the entries are longer than 80 characters. wrap-and-sort sorts the package lists in Debian control files and all .install files. Beside that wrap-and-sort removes trailing spaces in these files. This script should be run in the root of a Debian package tree. It searches for control, control.in, copyright, copyright.in, install, and *.install in the debian directory. OPTIONS
-h, --help Show this help message and exit. -a, --wrap-always Wrap all package lists in the Debian control file even if the entries are shorter than 80 characters and could fit in one line line. -s, --short-indent Only indent wrapped lines by one space (default is in-line with the field name). -b, --sort-binary-packages Sort binary package paragraphs by name. -k, --keep-first When sorting binary package paragraphs, leave the first one at the top. Unqualified debhelper(7) configuration files are applied to the first package. -n, --no-cleanup Do not remove trailing whitespaces. -d path, --debian-directory=path Location of the debian directory (default: ./debian). -f file, --file=file Wrap and sort only the specified file. You can specify this parameter multiple times. All supported files will be processed if no files are specified. -v, --verbose Print all files that are touched. AUTHORS
wrap-and-sort and this manpage have been written by Benjamin Drung <bdrung@debian.org>. Both are released under the ISC license. DEBIAN
Debian Utilities WRAP-AND-SORT(1)
All times are GMT -4. The time now is 10:16 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy