awk - splitting 1 large file into multiple based on same key records
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
In the above example after checking the NR reaching half million I also want to check the key on each line and match it with the key value of the previous line, if found same I would like to add this record in same output file instead of sending it to a new file.
Any help will be highly appreciated.
Thanks
Last edited by Scott; 01-19-2011 at 11:07 AM..
Reason: Code tags
All,
We receive a file with a large no of records (records can vary) and we have to split it into two files based on another file. e.g.
File1:
UHDR 2008112
"25187","00000022","00",21-APR-1991,"" ,"D",-000000519,+0000000000,"C", ,+000000000,+000000000,000000000,"2","" ... (2 Replies)
For example suppose I have a file which contains data as:
$cat data
800,2
100,9
700,3
100,9
200,8
100,3
Now I want the output as
200,8
700,3
800,2
Key is first three characters, I don't want any reords which are having duplicate keys.
Like sort +0.0 -0.3 data can we use... (9 Replies)
I need to write a shell script for below scenario
My input file has data in format:
qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43
qwerty0101CFG 12345... (19 Replies)
Hi Experts,
I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is:
Master.....
First...
second....
second...
third..
third...
Master...
First..
second...
third...
Master...
First...
second..
second..
second..... (2 Replies)
Hello,
For the input file, I am trying to split those records which have multiple values seperated by '|' in the last input field, into multiple records and each record corresponds to the common input fields + one of the value from the last field.
I was trying with an example on this forum... (4 Replies)
I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this:
HMMER3/b
NAME 1-cysPrx_C
ACC ... (2 Replies)
A text file has 2 fields (Data, Filename) delimited by # as below,
Data,Filename
Row1 -> abc#Test1.xml
Row2 -> xyz#Test2.xml
Row3 -> ghi#Test3.xml
The content in first field has to be written into a file where filename should be considered from second field.
So from... (4 Replies)
I will simplify the explaination a bit, I need to parse through a 87m file -
I have a single text file in the form of :
<NAME>house........
SOMETEXT
SOMETEXT
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
. (6 Replies)
Hi All,
This is my first post here. Hoping to share and gain knowledge from this great forum !!!!
I've scanned this forum before posting my problem here, but I'm afraid I couldn't find any thread that addresses this exact problem.
I'm trying to split a large XML file (with multiple tag... (7 Replies)
Hello I have a file of following format
HDR 1234 abc qwerty
abc def ghi jkl
HDR 4567 xyz qwerty
abc def ghi jkl
HDR 890 mno qwerty
abc def ghi jkl
HDR 1234 abc qwerty
abc def ghi jkl
HDR 1234 abc qwerty
abc def ghi jkl
-Need to split this into multiple files based on tag... (8 Replies)
Discussion started by: wincrazy
8 Replies
LEARN ABOUT OSX
cap_mkdb
CAP_MKDB(1) BSD General Commands Manual CAP_MKDB(1)NAME
cap_mkdb -- create capability database
SYNOPSIS
cap_mkdb [-v] [-f outfile] file1 [file2 ...]
DESCRIPTION
Cap_mkdb builds a hashed database out of the getcap(3) logical database constructed by the concatenation of the specified files .
The database is named by the basename of the first file argument and the string ``.db''. The getcap(3) routines can access the database in
this form much more quickly than they can the original text file(s).
The ``tc'' capabilities of the records are expanded before the record is stored into the database.
The options as as follows:
-f outfile
Specify a different database basename.
-v Print out the number of capability records in the database.
FORMAT
Each record is stored in the database using two different types of keys.
The first type is a key which consists of the first capability of the record (not including the trailing colon (``:'')) with a data field
consisting of a special byte followed by the rest of the record. The special byte is either a 0 or 1, where a 0 means that the record is
okay, and a 1 means that there was a ``tc'' capability in the record that couldn't be expanded.
The second type is a key which consists of one of the names from the first capability of the record with a data field consisting a special
byte followed by the the first capability of the record. The special byte is a 2.
In normal operation names are looked up in the database, resulting in a key/data pair of the second type. The data field of this key/data
pair is used to look up a key/data pair of the first type which has the real data associated with the name.
RETURN VALUE
The cap_mkdb utility exits 0 on success and >0 if an error occurs.
SEE ALSO dbopen(3), getcap(3), termcap(5)BSD June 2, 2019 BSD