split large file based on field criteria


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting split large file based on field criteria
# 1  
Old 06-19-2009
split large file based on field criteria

I have a file containing date/time sorted data of the form
...
2009/06/10,20:59:59.950,XAG/USD,Q,1,1115, 14.3025,100,1,1
2009/06/10,20:59:59.950,XAG/USD,Q,1,1116, 14.3026,125,1,1
2009/06/10,20:59:59.950,XAG/USD,R,0,0, , 0,0,0
2009/06/10,20:59:59.950,XAG/USD,R,1,0, 14.1910,100,1,1
2009/06/10,20:59:59.950,XAG/USD,A,0,, 14.3011,100,1
2009/06/10,21:00:00.100,CHF/JPY,Q,0,0, , 0,0,0
2009/06/10,21:00:00.100,CHF/JPY,Q,1,0, 70.26, 60,2,2
2009/06/10,21:00:00.150,CHF/JPY,D,0, 70.14, 20,XC05, ,NYD9,US,NYA1
...

I want to split this file into exactly two files based on the the date/time criteria. The criteria is all the lines with timestamps less than and equal to 21:00:00.000 should go to 'file1' and greater than 21:00:00.000 should goto 'file2'.

I wrote a simple script using while loop reading each line and matching criteria.
The script works fine but since these files containing data are huge (gigs), the processing takes forever.

Is there a a better way (sed, awk, egrep or even split) to use this more effeciently??

Thanks.
# 2  
Old 06-19-2009
have you tried split or csplit ? check their man or info pages.
# 3  
Old 06-20-2009
Sorry, but I user awk for everything!

Code:
cat myfile | awk -F, '
  $2 ~ /^21/ { print > "file2"; next }
  { print > "file1" }
'

If you want two files based on each date...

Code:
cat myfile | awk -F, '
  { gsub( "/", "_", $1 ) }
  $2 ~ /^21/ { print > $1 "_2"; next }
  { print > $1 "_1" }
'

# 4  
Old 06-20-2009
Quote:
Originally Posted by scottn
Sorry, but I user awk for everything!
then why did you use cat? Smilie
# 5  
Old 06-20-2009
I didn't say I used awk exclusively. That would just be boring Smilie

But I see what you mean

Code:
awk -F, '
  { gsub( "/", "_", $1 ) }
  $2 ~ /^21/ { print > $1 "_2"; next }
  { print > $1 "_1" }
' myfile

Force of habit!
# 6  
Old 06-22-2009
Scottn,

thanks for your reply. It pretty much does the work. But I noticed two issues:

1) All the delimiters in the file "," are gone!

2) I see why you need to replace "/" with an "_" since files with "/" are not allowed, but the side effect of this is that the "/" in the file are replaced with "_".

Last edited by asriva; 06-22-2009 at 11:02 AM..
# 7  
Old 06-22-2009
Code:
$
$ cat data.txt
2009/06/10,20:59:59.950,XAG/USD,Q,1,1115, 14.3025,100,1,1
2009/06/10,20:59:59.950,XAG/USD,Q,1,1116, 14.3026,125,1,1
2009/06/10,20:59:59.950,XAG/USD,R,0,0, , 0,0,0
2009/06/10,20:59:59.950,XAG/USD,R,1,0, 14.1910,100,1,1
2009/06/10,20:59:59.950,XAG/USD,A,0,, 14.3011,100,1
2009/06/10,21:00:00.100,CHF/JPY,Q,0,0, , 0,0,0
2009/06/10,21:00:00.100,CHF/JPY,Q,1,0, 70.26, 60,2,2
2009/06/10,21:00:00.150,CHF/JPY,D,0, 70.14, 20,XC05, ,NYD9,US,NYA1
$
$ perl -ne 'BEGIN{open(F1,">file1"); open(F2,">file2")}
>   { split/[,:]/;
>     if($_[1] <= 20){print F1 $_} else{print F2 $_}
>   }
> END {close(F1); close(F2)}' data.txt
$
$ cat file1
2009/06/10,20:59:59.950,XAG/USD,Q,1,1115, 14.3025,100,1,1
2009/06/10,20:59:59.950,XAG/USD,Q,1,1116, 14.3026,125,1,1
2009/06/10,20:59:59.950,XAG/USD,R,0,0, , 0,0,0
2009/06/10,20:59:59.950,XAG/USD,R,1,0, 14.1910,100,1,1
2009/06/10,20:59:59.950,XAG/USD,A,0,, 14.3011,100,1
$
$ cat file2
2009/06/10,21:00:00.100,CHF/JPY,Q,0,0, , 0,0,0
2009/06/10,21:00:00.100,CHF/JPY,Q,1,0, 70.26, 60,2,2
2009/06/10,21:00:00.150,CHF/JPY,D,0, 70.14, 20,XC05, ,NYD9,US,NYA1
$
$

tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split file based on a column/field value

Hi All, I have a requirement to split file into 2 sets of file. Below is a sample data of the file AU;PTN;24EX;25-AUG-14;AU;123;SE;123;Test NN;;;;ASD; AU;PTN;24EX;25-AUG-14;AU;456;SE;456;Test NN;;;;ASD; AU;PTN;24EX;25-AUG-14;AU;147;SE;147;Test NN;;;;ASD;... (6 Replies)
Discussion started by: galaxy_rocky
6 Replies

2. Shell Programming and Scripting

Split Large Files Based On Row Pattern..

Hi all. I've tried searching the web but could not find similar problem to mine. I have one large file to be splitted into several files based on the matching pattern found in each row. For example, let's say the file content: ... (13 Replies)
Discussion started by: aimy
13 Replies

3. Shell Programming and Scripting

Help needed - Split large file into smaller files based on pattern match

Help needed urgently please. I have a large file - a few hundred thousand lines. Sample CP START ACCOUNT 1234556 name 1 CP END ACCOUNT CP START ACCOUNT 2224444 name 1 CP END ACCOUNT CP START ACCOUNT 333344444 name 1 CP END ACCOUNT I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies

4. Shell Programming and Scripting

How to split file into multiple files using awk based on 1 field in the file?

Good day all I need some helps, say that I have data like below, each field separated by a tab DATE NAME ADDRESS 15/7/2012 LX a.b.c 15/7/2012 LX1 a.b.c 16/7/2012 AB a.b.c 16/7/2012 AB2 a.b.c 15/7/2012 LX2 a.b.c... (2 Replies)
Discussion started by: alexyyw
2 Replies

5. Shell Programming and Scripting

Split a file into multiple files based on field value

Hi, I've one requirement. I have to split one comma delimited file into multiple files based on one of the column values. How can I achieve this Unix Here is the sample data. In this case I have split the files based on date column(c4) Input file c1,c2,c3,c4,c5... (1 Reply)
Discussion started by: manasvi24
1 Replies

6. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

7. UNIX for Dummies Questions & Answers

remove duplicates based on a field and criteria

Hi, I have a file with fields like below: A;XYZ;102345;222 B;XYZ;123243;333 C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 desired output: C;ABC;234234;444 D;MNO;103345;222 E;DEF;124243;333 ie, if the 4rth field is a duplicate.. i need only those records where... (5 Replies)
Discussion started by: wanderingmind16
5 Replies

8. Shell Programming and Scripting

Split large file based on last digit from a column

Hello, What's the best way to split a large into multiple files based on the last digit in the first column. input file: f 2738483300000x0y03772748378831x1y13478378358383x2y23743878383802x3y33787828282820x4y43748838383881x5y5 Desired Output: f0 3738483300000x0y03787828282820x4y4 f1... (9 Replies)
Discussion started by: alain.kazan
9 Replies

9. Solaris

Split a file which a word criteria in two files with awk

Hello, I'm searching with the Awk command to split a file into two others files. I explain : in the file N°1 I search the word "NameVirtual" and since that word to the end of the file I want to store all lines in a new file N°2 Also from that word to the beginning of the file I want to... (11 Replies)
Discussion started by: steiner
11 Replies

10. Shell Programming and Scripting

Split file based on field

Hi I have a large file 2.6 million records and I am trying to split the file based on last column. I am doing awk -F"|" '{ print > $NF }' filename1 After around 1000 splits it gives me a error awk: can't open file 3332332423 input record number 1068, file filename1 source... (6 Replies)
Discussion started by: s_adu
6 Replies
Login or Register to Ask a Question