Sponsored Content
Top Forums Shell Programming and Scripting Extract data from large file 80+ million records Post 302321978 by cfajohnson on Tuesday 2nd of June 2009 12:48:37 PM
Old 06-02-2009
Quote:
Originally Posted by learner16s
I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file.

...

I tried to use grep ...but it took a lot of time ..nearly 45 mintues to give me output file.

With a file that size, anything is going to take a long time. There's not going to be anything faster than grep, with the possible exception of a filter written in C that does nothing but what you want.

With that much data, you might want to look at using a DBMS, e.g., PostgresQL.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to Extract Data From 94000 records

i have a input file which does not have a delimiter All i Need to do is to identify a line and extract the data from it and run the loop again and need to ensure that it was not extracted earlier Input file ------------ abcd 12345 egfhijk ip 192.168.0.1 CNN.com abcd 12345 egfhijk ip... (12 Replies)
Discussion started by: vasimm
12 Replies

2. Shell Programming and Scripting

sort a file which has 3.7 million records

hi, I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated... sort: Write error while merging. Thanks (6 Replies)
Discussion started by: greenworld
6 Replies

3. Shell Programming and Scripting

How to Pick Random records from a large file

Hi, I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this? (1 Reply)
Discussion started by: ajithshankar@ho
1 Replies

4. Shell Programming and Scripting

Extract data from records that match pattern

Hi Guys, I have a file as follows: a b c 1 2 3 4 pp gg gh hh 1 2 fm 3 4 g h i j k l m 1 2 3 4 d e f g h j i k l 1 2 3 f 3 4 r t y u i o p d p re 1 2 3 f 4 t y w e q w r a s p a 1 2 3 4 I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies

5. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies

6. Programming

Suitable data structure large number of heterogeneous records

Hi All, I don't need any code for this just some advice. I have a large collection of heterogeneous data (about 1.3 million) which simply means data of different types like float, long double, string, ints. I have built a linked list for it and stored all the different data types in a structure,... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies

7. Shell Programming and Scripting

Matching 10 Million file records with 10 Million in other file

Dear All, I have two files both containing 10 Million records each separated by comma(csv fmt). One file is input.txt other is status.txt. Input.txt-> contains fields with one unique id field (primary key we can say) Status.txt -> contains two fields only:1. unique id and 2. status ... (8 Replies)
Discussion started by: vguleria
8 Replies

8. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies

9. Shell Programming and Scripting

Quick way to select many records from a large file

I have a file, named records.txt, containing large number of records, around 0.5 million records in format below: 28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2 28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2 ... Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies

10. Shell Programming and Scripting

Need to extract 8 characters from a large file.

Hi All!! I have a large file containing millions of records. My purpose is to extract 8 characters immediately from the given file. 222222222|ZRF|2008.pdf|2008|01/29/2009|001|B|C|C 222222222|ZRF|2009.pdf|2009|01/29/2010|001|B|C|C 222222222|ZRF|2010.pdf|2010|01/29/2011|001|B|C|C... (5 Replies)
Discussion started by: pavand
5 Replies
DBSWISS(1)							   User Commands							DBSWISS(1)

NAME
dbSwiss - create DBM version of Swiss-Prot data SYNOPSIS
/usr/share/librg-utils-perl/dbSwiss [OPTIONS] /usr/share/librg-utils-perl/dbSwiss --datadir /data/swissprot --infile /data/swissprot/uniprot_sprot.dat /usr/share/librg-utils-perl/dbSwiss [--help] [--man] DESCRIPTION
dbSwiss creates DBM version of Swiss-Prot data. This procedure is to replace splitSwiss.pl. splitSwiss.pl saves Swiss-Prot records in separate files resulting in over 13 million relatively tiny files that take very long to create and rsync. dbSwiss instead saves each record into a DBM database that is optimized for fast retrieval. OPTIONS
-d, --datadir=path directory of database files, default: '/mnt/project/rost_db/data/swissprot' --debug --nodebug --first20 --nofirst20 process only first 20 records, for debugging --help -i, --infile=path Swiss-Prot data flatfile, default: '/mnt/project/rost_db/data/swissprot/uniprot_sprot.dat'. --man --quiet --noquiet do not print progress status --readback --noreadback read records back after storing and print them --table name of database table and consequently the base name of database files, default: 'dbswiss' --version -w, --workdir=path Optional working directory. Automatically created and removed if not defined. AUTHOR
Laszlo Kajan <lkajan@rostlab.org> 1.0.43 2011-11-28 DBSWISS(1)
All times are GMT -4. The time now is 05:55 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy