Extract data from large file 80+ million records Post: 302321978

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to Extract Data From 94000 records

i have a input file which does not have a delimiter All i Need to do is to identify a line and extract the data from it and run the loop again and need to ensure that it was not extracted earlier Input file ------------ abcd 12345 egfhijk ip 192.168.0.1 CNN.com abcd 12345 egfhijk ip...

2. Shell Programming and Scripting

sort a file which has 3.7 million records

hi, I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated... sort: Write error while merging. Thanks

3. Shell Programming and Scripting

How to Pick Random records from a large file

Hi, I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this?

4. Shell Programming and Scripting

Extract data from records that match pattern

Hi Guys, I have a file as follows: a b c 1 2 3 4 pp gg gh hh 1 2 fm 3 4 g h i j k l m 1 2 3 4 d e f g h j i k l 1 2 3 f 3 4 r t y u i o p d p re 1 2 3 f 4 t y w e q w r a s p a 1 2 3 4 I am trying to extract all the 2's from each row. 2 is just an example...

5. Shell Programming and Scripting

awk - splitting 1 large file into multiple based on same key records

Hello gurus, I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files. e.g. my data is like: Row_Num,...

6. Programming

Suitable data structure large number of heterogeneous records

Hi All, I don't need any code for this just some advice. I have a large collection of heterogeneous data (about 1.3 million) which simply means data of different types like float, long double, string, ints. I have built a linked list for it and stored all the different data types in a structure,...

7. Shell Programming and Scripting

Matching 10 Million file records with 10 Million in other file

Dear All, I have two files both containing 10 Million records each separated by comma(csv fmt). One file is input.txt other is status.txt. Input.txt-> contains fields with one unique id field (primary key we can say) Status.txt -> contains two fields only:1. unique id and 2. status ...

8. Shell Programming and Scripting

Split a large file in n records and skip a particular record

Hello All, I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record...

9. Shell Programming and Scripting

Quick way to select many records from a large file

I have a file, named records.txt, containing large number of records, around 0.5 million records in format below: 28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2 28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2 ... Another file is a key file, named key.txt, which is the list of some numbers in the first column of...

10. Shell Programming and Scripting

Need to extract 8 characters from a large file.

Hi All!! I have a large file containing millions of records. My purpose is to extract 8 characters immediately from the given file. 222222222|ZRF|2008.pdf|2008|01/29/2009|001|B|C|C 222222222|ZRF|2009.pdf|2009|01/29/2010|001|B|C|C 222222222|ZRF|2010.pdf|2010|01/29/2011|001|B|C|C...

LEARN ABOUT DEBIAN

dbswiss

DBSWISS(1)							   User Commands							DBSWISS(1)

NAME

       dbSwiss - create DBM version of Swiss-Prot data

SYNOPSIS

       /usr/share/librg-utils-perl/dbSwiss [OPTIONS]

       /usr/share/librg-utils-perl/dbSwiss --datadir /data/swissprot --infile /data/swissprot/uniprot_sprot.dat

       /usr/share/librg-utils-perl/dbSwiss [--help] [--man]

DESCRIPTION

       dbSwiss creates DBM version of Swiss-Prot data.	This procedure is to replace splitSwiss.pl.  splitSwiss.pl saves Swiss-Prot records in
       separate files resulting in over 13 million relatively tiny files that take very long to create and rsync.  dbSwiss instead saves each
       record into a DBM database that is optimized for fast retrieval.

OPTIONS

       -d, --datadir=path
	   directory of database files, default: '/mnt/project/rost_db/data/swissprot'

       --debug
       --nodebug
       --first20
       --nofirst20
	   process only first 20 records, for debugging

       --help
       -i, --infile=path
	   Swiss-Prot data flatfile, default: '/mnt/project/rost_db/data/swissprot/uniprot_sprot.dat'.

       --man
       --quiet
       --noquiet
	   do not print progress status

       --readback
       --noreadback
	   read records back after storing and print them

       --table
	   name of database table and consequently the base name of database files, default: 'dbswiss'

       --version
       -w, --workdir=path
	   Optional working directory. Automatically created and removed if not defined.

AUTHOR

       Laszlo Kajan <lkajan@rostlab.org>

1.0.43								    2011-11-28								DBSWISS(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to Extract Data From 94000 records

Discussion started by: vasimm

2. Shell Programming and Scripting

sort a file which has 3.7 million records

Discussion started by: greenworld

3. Shell Programming and Scripting

How to Pick Random records from a large file

Discussion started by: ajithshankar@ho

4. Shell Programming and Scripting

Extract data from records that match pattern

Discussion started by: npatwardhan