Sponsored Content
Top Forums Shell Programming and Scripting Extract data from large file 80+ million records Post 302321978 by cfajohnson on Tuesday 2nd of June 2009 12:48:37 PM
Old 06-02-2009
Quote:
Originally Posted by learner16s
I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file.

...

I tried to use grep ...but it took a lot of time ..nearly 45 mintues to give me output file.

With a file that size, anything is going to take a long time. There's not going to be anything faster than grep, with the possible exception of a filter written in C that does nothing but what you want.

With that much data, you might want to look at using a DBMS, e.g., PostgresQL.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to Extract Data From 94000 records

i have a input file which does not have a delimiter All i Need to do is to identify a line and extract the data from it and run the loop again and need to ensure that it was not extracted earlier Input file ------------ abcd 12345 egfhijk ip 192.168.0.1 CNN.com abcd 12345 egfhijk ip... (12 Replies)
Discussion started by: vasimm
12 Replies

2. Shell Programming and Scripting

sort a file which has 3.7 million records

hi, I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated... sort: Write error while merging. Thanks (6 Replies)
Discussion started by: greenworld
6 Replies

3. Shell Programming and Scripting

How to Pick Random records from a large file

Hi, I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this? (1 Reply)
Discussion started by: ajithshankar@ho
1 Replies

4. Shell Programming and Scripting

Extract data from records that match pattern

Hi Guys, I have a file as follows: a b c 1 2 3 4 pp gg gh hh 1 2 fm 3 4 g h i j k l m 1 2 3 4 d e f g h j i k l 1 2 3 f 3 4 r t y u i o p d p re 1 2 3 f 4 t y w e q w r a s p a 1 2 3 4 I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies

5. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

6. Programming

Suitable data structure large number of heterogeneous records

Hi All, I don't need any code for this just some advice. I have a large collection of heterogeneous data (about 1.3 million) which simply means data of different types like float, long double, string, ints. I have built a linked list for it and stored all the different data types in a structure,... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies

7. Shell Programming and Scripting

Matching 10 Million file records with 10 Million in other file

Dear All, I have two files both containing 10 Million records each separated by comma(csv fmt). One file is input.txt other is status.txt. Input.txt-> contains fields with one unique id field (primary key we can say) Status.txt -> contains two fields only:1. unique id and 2. status ... (8 Replies)
Discussion started by: vguleria
8 Replies

8. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies

9. Shell Programming and Scripting

Quick way to select many records from a large file

I have a file, named records.txt, containing large number of records, around 0.5 million records in format below: 28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2 28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2 ... Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies

10. Shell Programming and Scripting

Need to extract 8 characters from a large file.

Hi All!! I have a large file containing millions of records. My purpose is to extract 8 characters immediately from the given file. 222222222|ZRF|2008.pdf|2008|01/29/2009|001|B|C|C 222222222|ZRF|2009.pdf|2009|01/29/2010|001|B|C|C 222222222|ZRF|2010.pdf|2010|01/29/2011|001|B|C|C... (5 Replies)
Discussion started by: pavand
5 Replies
RAGATOR(1)						      General Commands Manual							RAGATOR(1)

NAME
ragator - aggregate argus(8) data file entries. COPYRIGHT
Copyright (c) 2000-2003 QoSient. All rights reserved. SYNOPSIS
ragator [-f ragator.conf] [raoptions] DESCRIPTION
Ragator reads argus(8) data from an argus-file, and merges matching argus flow activity records together. In its default mode of options, this effectively converts argus(8) files from detail to non-detail mode, and merges periodic flow report records to a single argus record, thus compressing the argus(8) file to a reduced size. You can modify the aggregation strategy used by ragator to merge records together, by using the -f ragator.conf option. See ragator(5) for a complete description of the format and syntax of the flow model file. OPTIONS
Ragator, like all ra based clients, supports a number of ra options including filtering of input argus records through a terminating filter expression. See ra(1) for a complete description of ra options. AUTHORS
Carter Bullard (carter@qosient.com). SEE ALSO
ragator(5) ra(1), rarc(5), argus(8) tcpdump(1), 21 July 1995 RAGATOR(1)
All times are GMT -4. The time now is 09:25 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy