06-02-2009
Quote:
Originally Posted by
learner16s
I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file.
...
I tried to use grep ...but it took a lot of time ..nearly 45 mintues to give me output file.
With a file that size, anything is going to take a long time. There's not going to be anything faster than grep, with the possible exception of a filter written in C that does nothing but what you want.
With that much data, you might want to look at using a DBMS, e.g., PostgresQL.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
i have a input file which does not have a delimiter
All i Need to do is to identify a line and extract the data from it and run the loop again and need to ensure that it was not extracted earlier
Input file
------------
abcd 12345 egfhijk ip 192.168.0.1 CNN.com
abcd 12345 egfhijk ip... (12 Replies)
Discussion started by: vasimm
12 Replies
2. Shell Programming and Scripting
hi,
I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated...
sort: Write error while merging.
Thanks (6 Replies)
Discussion started by: greenworld
6 Replies
3. Shell Programming and Scripting
Hi,
I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this? (1 Reply)
Discussion started by: ajithshankar@ho
1 Replies
4. Shell Programming and Scripting
Hi Guys,
I have a file as follows:
a b c 1 2 3 4
pp gg gh hh 1 2 fm 3 4
g h i j k l m 1 2 3 4
d e f g h j i k l 1 2 3 f 3 4
r t y u i o p d p re 1 2 3 f 4
t y w e q w r a s p a 1 2 3 4
I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies
5. Shell Programming and Scripting
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies
6. Programming
Hi All,
I don't need any code for this just some advice. I have a large collection of heterogeneous data (about 1.3 million) which simply means data of different types like float, long double, string, ints. I have built a linked list for it and stored all the different data types in a structure,... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies
7. Shell Programming and Scripting
Dear All,
I have two files both containing 10 Million records each separated by comma(csv fmt).
One file is input.txt other is status.txt.
Input.txt-> contains fields with one unique id field (primary key we can say)
Status.txt -> contains two fields only:1. unique id and 2. status
... (8 Replies)
Discussion started by: vguleria
8 Replies
8. Shell Programming and Scripting
Hello All,
I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using
sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies
9. Shell Programming and Scripting
I have a file, named records.txt, containing large number of records, around 0.5 million records in format below:
28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2
28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2
...
Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies
10. Shell Programming and Scripting
Hi All!!
I have a large file containing millions of records. My purpose is to extract 8 characters immediately from the given file.
222222222|ZRF|2008.pdf|2008|01/29/2009|001|B|C|C
222222222|ZRF|2009.pdf|2009|01/29/2010|001|B|C|C
222222222|ZRF|2010.pdf|2010|01/29/2011|001|B|C|C... (5 Replies)
Discussion started by: pavand
5 Replies
TRPT(8) BSD System Manager's Manual TRPT(8)
NAME
trpt -- transliterate protocol trace
SYNOPSIS
trpt [-a] [-f] [-j] [-p hex-address] [-s] [-t] [-N system] [-M core]
DESCRIPTION
trpt interrogates the buffer of TCP trace records created when a socket is marked for ``debugging'' (see setsockopt(2)), and prints a read-
able description of these records. When no options are supplied, trpt prints all the trace records found in the system grouped according to
TCP connection protocol control block (PCB). The following options may be used to alter this behavior.
-a In addition to the normal output, print the values of the source and destination addresses for each packet recorded.
-f Follow the trace as it occurs, waiting a short time for additional records each time the end of the log is reached.
-j Just give a list of the protocol control block addresses for which there are trace records.
-p Show only trace records associated with the protocol control block at the given address hex-address.
-s In addition to the normal output, print a detailed description of the packet sequencing information.
-t in addition to the normal output, print the values for all timers at each point in the trace.
-M core
Extract values associated with the name list from core.
-N system
Extract the name list from system.
The recommended use of trpt is as follows. Isolate the problem and enable debugging on the socket(s) involved in the connection. Find the
address of the protocol control blocks associated with the sockets using the -A option to netstat(1). Then run trpt with the -p option, sup-
plying the associated protocol control block addresses. The -f option can be used to follow the trace log once the trace is located. If
there are many sockets using the debugging option, the -j option may be useful in checking to see if any trace records are present for the
socket in question.
SYSCTLS
The following sysctls are used by trpt. The TCP_DEBUG kernel option must be enabled.
net.inet.tcp.debug Structure containing TCP sockets information used by trpt.
net.inet.tcp.debx Number of TCP debug messages.
DIAGNOSTICS
no namelist
When the image doesn't contain the proper symbols to find the trace buffer; others which should be self explanatory.
SEE ALSO
netstat(1), setsockopt(2)
HISTORY
The trpt command appeared in 4.2BSD.
BUGS
Should also print the data for each input or output, but this is not saved in the trace record.
The output format is inscrutable and should be described here.
BSD
August 30, 2007 BSD