06-02-2009
Extract data from large file 80+ million records
Hello,
I have got one file with more than 120+ million records(35 GB in size). I have to extract some relevant data from file based on some parameter and generate other output file.
What will be the besat and fastest way to extract the ne file.
sample file format :--
++++++7777jjjjjjj0000000000 ( header record)
2098 POCG 0000 KKKK
2097 KOLL 0F00 KLLL
2095 LKJH 0L99 L0IU
.
.
.
.
********66666666666**** ( trailer record
Now suppose i enter the key as 2098(field as key) , so all rercords with 2098 as the first record should be moved to new file.
**********************************************
I tried to use grep ...but it took a lot of time ..nearly 45 mintues to give me output file.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
i have a input file which does not have a delimiter
All i Need to do is to identify a line and extract the data from it and run the loop again and need to ensure that it was not extracted earlier
Input file
------------
abcd 12345 egfhijk ip 192.168.0.1 CNN.com
abcd 12345 egfhijk ip... (12 Replies)
Discussion started by: vasimm
12 Replies
2. Shell Programming and Scripting
hi,
I'm trying to sort a file which has 3.7 million records an gettign the following error...any help is appreciated...
sort: Write error while merging.
Thanks (6 Replies)
Discussion started by: greenworld
6 Replies
3. Shell Programming and Scripting
Hi,
I have a huge file say with 2000000 records. The file has 42 fields. I would like to pick randomly 1000 records from this huge file. Can anyone help me how to do this? (1 Reply)
Discussion started by: ajithshankar@ho
1 Replies
4. Shell Programming and Scripting
Hi Guys,
I have a file as follows:
a b c 1 2 3 4
pp gg gh hh 1 2 fm 3 4
g h i j k l m 1 2 3 4
d e f g h j i k l 1 2 3 f 3 4
r t y u i o p d p re 1 2 3 f 4
t y w e q w r a s p a 1 2 3 4
I am trying to extract all the 2's from each row. 2 is just an example... (6 Replies)
Discussion started by: npatwardhan
6 Replies
5. Shell Programming and Scripting
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
Row_Num,... (6 Replies)
Discussion started by: kam66
6 Replies
6. Programming
Hi All,
I don't need any code for this just some advice. I have a large collection of heterogeneous data (about 1.3 million) which simply means data of different types like float, long double, string, ints. I have built a linked list for it and stored all the different data types in a structure,... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies
7. Shell Programming and Scripting
Dear All,
I have two files both containing 10 Million records each separated by comma(csv fmt).
One file is input.txt other is status.txt.
Input.txt-> contains fields with one unique id field (primary key we can say)
Status.txt -> contains two fields only:1. unique id and 2. status
... (8 Replies)
Discussion started by: vguleria
8 Replies
8. Shell Programming and Scripting
Hello All,
I have a large file, more than 50,000 lines, and I want to split it in even 5000 records. Which I can do using
sed '1d;$d;' <filename> | awk 'NR%5000==1{x="F"++i;}{print > x}'Now I need to add one more condition that is not to break the file at 5000th record if the 5000th record... (20 Replies)
Discussion started by: ibmtech
20 Replies
9. Shell Programming and Scripting
I have a file, named records.txt, containing large number of records, around 0.5 million records in format below:
28433005 1 1 3 2 2 2 2 2 2 2 2 2 2 2
28433004 0 2 3 2 2 2 2 2 2 1 2 2 2 2
...
Another file is a key file, named key.txt, which is the list of some numbers in the first column of... (5 Replies)
Discussion started by: zenongz
5 Replies
10. Shell Programming and Scripting
Hi All!!
I have a large file containing millions of records. My purpose is to extract 8 characters immediately from the given file.
222222222|ZRF|2008.pdf|2008|01/29/2009|001|B|C|C
222222222|ZRF|2009.pdf|2009|01/29/2010|001|B|C|C
222222222|ZRF|2010.pdf|2010|01/29/2011|001|B|C|C... (5 Replies)
Discussion started by: pavand
5 Replies
LEARN ABOUT DEBIAN
srec_stewie
srec_stewie(5) File Formats Manual srec_stewie(5)
NAME
srec_stewie - Stewie's binary file format
DESCRIPTION
If you have a URL for documentation of this format, please let me know.
Any resemblance to the Motorola S-Record is superficial, and extends only to the data records. The header records and termination records
are completely different. None of the other Motorola S-Records record type are available.
The Records
All records start with an ASCII capital S character, value 0x53, followed by a type specifier byte. All records consist of binary bytes.
The Header Record
Each file starts with a fixed four byte header record.
+-----+------+------+------+
|0x53 | 0x30 | 0x30 | 0x33 |
+-----+------+------+------+
The Data Records
Each data record consists of 5 fields. These are the type field, length field, address field, data field, and the checksum. The lines
always start with a capital S character.
+-----+------+---------------+---------+------+----------+
|0x53 | Type | Record Length | Address | Data | Checksum |
+-----+------+---------------+---------+------+----------+
Type The type field is a one byte field that specifies whether the record has a two-byte address field (0x31), a three-byte address
field (0x32) or a four-byte address field (0x33). The address is big-endian.
Record Length
The record length field is a one byte field that specifies the number of bytes in the record following this byte.
Address This is a 2-, 3- or 4-byte address that specifies where the data in the record is to be loaded into memory.
Data The data field contains the executable code, memory-loadable data or descriptive information to be transferred.
Checksum
The checksum is a one byte field that represents the least significant byte of the one's complement of the sum of the values repre-
sented by the bytes making up the record's length, address, and data fields.
The Termination Record
Each file ends with a fixed two byte termination record.
+-----+------+
|0x53 | 0x38 |
+-----+------+
Size Multiplier
In general, binary data will expand in sized by approximately 1.2 times when represented with this format.
EXAMPLE
Here is an hex-dump example file. It contains the data "Hello, World[rq] to be loaded at address 0.
0000: 53 30 30 33 53 31 10 00 00 48 65 6C 6C 6F 2C 20 S003S1...Hello,
0010: 57 6F 72 6C 64 0A 9D 53 38 World..S8
COPYRIGHT
srec_cat version 1.58
Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Peter Miller
The srec_cat program comes with ABSOLUTELY NO WARRANTY; for details use the 'srec_cat -VERSion License' command. This is free software and
you are welcome to redistribute it under certain conditions; for details use the 'srec_cat -VERSion License' command.
AUTHOR
Peter Miller E-Mail: pmiller@opensource.org.au
//* WWW: http://miller.emu.id.au/pmiller/
Reference Manual SRecord srec_stewie(5)