06-02-2009
Quote:
Originally Posted by
learner16s
I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file.
...
I tried to use grep ...but it took a lot of time ..nearly 45 mintues to give me output file.
With a file that size, anything is going to take a long time. There's not going to be anything faster than grep, with the possible exception of a filter written in C that does nothing but what you want.
With that much data, you might want to look at using a DBMS, e.g., PostgresQL.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
i have a input file which does not have a delimiter
All i Need to do is to identify a line and extract the data from it and run the loop again and need to ensure that it was not extracted earlier
Input file
------------
abcd 12345 egfhijk ip 192.168.0.1 CNN.com
abcd 12345 egfhijk ip... (12 Replies)
Discussion started by: vasimm
12 Replies
2. Shell Programming and Scripting
hi,
I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated...
sort: Write error while merging.
Thanks (6 Replies)
Discussion started by: greenworld
6 Replies
3. Shell Programming and Scripting
Hi,
I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this? (1 Reply)
Discussion started by: ajithshankar@ho
1 Replies
4. Shell Programming and Scripting
Hi Guys,
I have a file as follows:
a b c 1 2 3 4
pp gg gh hh 1 2 fm 3 4
g h i j k l m 1 2 3 4
d e f g h j i k l 1 2 3 f 3 4
r t y u i o p d p re 1 2 3 f 4
t y w e q w r a s p a 1 2 3 4
I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies
5. Shell Programming and Scripting
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies
6. Programming
Hi All,
I don't need any code for this just some advice. I have a large collection of heterogeneous data (about 1.3 million) which simply means data of different types like float, long double, string, ints. I have built a linked list for it and stored all the different data types in a structure,... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies
7. Shell Programming and Scripting
Dear All,
I have two files both containing 10 Million records each separated by comma(csv fmt).
One file is input.txt other is status.txt.
Input.txt-> contains fields with one unique id field (primary key we can say)
Status.txt -> contains two fields only:1. unique id and 2. status
... (8 Replies)
Discussion started by: vguleria
8 Replies
8. Shell Programming and Scripting
Hello All,
I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using
sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies
9. Shell Programming and Scripting
I have a file, named records.txt, containing large number of records, around 0.5 million records in format below:
28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2
28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2
...
Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies
10. Shell Programming and Scripting
Hi All!!
I have a large file containing millions of records. My purpose is to extract 8 characters immediately from the given file.
222222222|ZRF|2008.pdf|2008|01/29/2009|001|B|C|C
222222222|ZRF|2009.pdf|2009|01/29/2010|001|B|C|C
222222222|ZRF|2010.pdf|2010|01/29/2011|001|B|C|C... (5 Replies)
Discussion started by: pavand
5 Replies
LEARN ABOUT DEBIAN
ogmdemux
OGMDEMUX(1) User Commands OGMDEMUX(1)
NAME
ogmdemux - Extract streams from OGG/OGM files into separate files
SYNOPSIS
ogmdemux [options] inname
DESCRIPTION
This program extracts all or only some streams from an OGM and writes them to separate files.
inname Use 'inname' as the source.
-o, --output out
Use 'out' as the base for destination file names. '-v1', '-v2', '-a1', '-t1'... will be appended to this name. Default: use
'inname'.
-a, --astream n
Extract specified audio stream. Can be used more than once. Default: extract all streams.
-d, --vstream n
Extract specified video stream. Can be used more than once. Default: extract all streams.
-t, --tstream n
Extract specified text stream. Can be used more than once. Default: extract all streams.
-na, --noaudio
Don't extract any audio streams.
-nv, --novideo
Don't extract any video streams.
-nt, --notext
Don't extract any text streams. Default: extract all streams.
-r, --raw
Extract the raw streams only. Default: extract to useful formats (AVI, WAV, OGG, SRT...).
-v, --verbose
Increase verbosity.
-h, --help
Show this help.
-V, --version
Show version number.
NOTES
What works:
* Extraction of the following formats is fully supported including writing the stream contents to useful container formats:
video -> AVI
Vorbis -> OGG/Vorbis
PCM -> WAV
text -> text files (SRT subtitle format)
* All other audio streams (MP3, AC3) are just copied 1:1 into output files. MP3 and AC3 files should be usable. Others might not.
What not works:
* Headers created by older OggDS (DirectShow) filter versions are not supported (and probably never will be).
AUTHOR
ogmdemux was written by Moritz Bunkus <moritz@bunkus.org>.
SEE ALSO
ogmmerge(1), ogmsplit(1), ogminfo(1), ogmcat(1), dvdxchap(1)
WWW
The newest version can always be found at <http://www.bunkus.org/videotools/ogmtools/> <http://www.bunkus.org/videotools/ogmtools/>
ogmdemux v1.5 November 2004 OGMDEMUX(1)