Sponsored Content
Top Forums Shell Programming and Scripting awk - splitting 1 large file into multiple based on same key records Post 302488880 by kam66 on Tuesday 18th of January 2011 04:38:22 PM
Old 01-18-2011
awk - splitting 1 large file into multiple based on same key records

Hello gurus,

I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.

e.g. my data is like:
Code:
Row_Num, Field1 (key), Field 2, Field3,......
 
500000, 100 , ABC, 10A --> goes into "file1"
500001, 100, DEF, 20A --> should also go in "file1" instead of "file2"
500002, 200, GHI, 30A --> should be the 1st record written in "file2"

In the above example after checking the NR reaching half million I also want to check the key on each line and match it with the key value of the previous line, if found same I would like to add this record in same output file instead of sending it to a new file.

Any help will be highly appreciated.

Thanks

Last edited by Scott; 01-19-2011 at 11:07 AM.. Reason: Code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a file based on the records in another file

All, We receive a file with a large no of records (records can vary) and we have to split it into two files based on another file. e.g. File1: UHDR 2008112 "25187","00000022","00",21-APR-1991,"" ,"D",-000000519,+0000000000,"C", ,+000000000,+000000000,000000000,"2","" ... (2 Replies)
Discussion started by: er_ashu
2 Replies

2. Shell Programming and Scripting

How to delete duplicate records based on key

For example suppose I have a file which contains data as: $cat data 800,2 100,9 700,3 100,9 200,8 100,3 Now I want the output as 200,8 700,3 800,2 Key is first three characters, I don't want any reords which are having duplicate keys. Like sort +0.0 -0.3 data can we use... (9 Replies)
Discussion started by: sumitc
9 Replies

3. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

4. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

5. Shell Programming and Scripting

Splitting record into multiple records by appending values from an input field (AWK)

Hello, For the input file, I am trying to split those records which have multiple values seperated by '|' in the last input field, into multiple records and each record corresponds to the common input fields + one of the value from the last field. I was trying with an example on this forum... (4 Replies)
Discussion started by: imtiaz99
4 Replies

6. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

7. Shell Programming and Scripting

Splitting records in a text file based on delimiter

A text file has 2 fields (Data, Filename) delimited by # as below, Data,Filename Row1 -> abc#Test1.xml Row2 -> xyz#Test2.xml Row3 -> ghi#Test3.xml The content in first field has to be written into a file where filename should be considered from second field. So from... (4 Replies)
Discussion started by: jayakkannan
4 Replies

8. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

9. Shell Programming and Scripting

Help with Splitting a Large XML file based on size AND tags

Hi All, This is my first post here. Hoping to share and gain knowledge from this great forum !!!! I've scanned this forum before posting my problem here, but I'm afraid I couldn't find any thread that addresses this exact problem. I'm trying to split a large XML file (with multiple tag... (7 Replies)
Discussion started by: Aviktheory11
7 Replies

10. Shell Programming and Scripting

Script for splitting file of records into multiple files

Hello I have a file of following format HDR 1234 abc qwerty abc def ghi jkl HDR 4567 xyz qwerty abc def ghi jkl HDR 890 mno qwerty abc def ghi jkl HDR 1234 abc qwerty abc def ghi jkl HDR 1234 abc qwerty abc def ghi jkl -Need to split this into multiple files based on tag... (8 Replies)
Discussion started by: wincrazy
8 Replies
comm(1) 							   User Commands							   comm(1)

NAME
comm - select or reject lines common to two files SYNOPSIS
comm [-123] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which must be ordered in the current collating sequence, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. If the input files were ordered according to the collating sequence of the current locale, the lines written will be in the collating sequence of the original lines. If not, the results are unspecified. OPTIONS
The following options are supported: -1 Suppresses the output column of lines unique to file1. -2 Suppresses the output column of lines unique to file2. -3 Suppresses the output column of lines duplicated in file1 and file2. OPERANDS
The following operands are supported: file1 A path name of the first file to be compared. If file1 is -, the standard input is used. file2 A path name of the second file to be compared. If file2 is -, the standard input is used. USAGE
See largefile(5) for the description of the behavior of comm when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes). EXAMPLES
Example 1 Printing a list of utilities specified by files If file1, file2, and file3 each contain a sorted list of utilities, the command example% comm -23 file1 file2 | comm -23 - file3 prints a list of utilities in file1 not specified by either of the other files. The entry: example% comm -12 file1 file2 | comm -12 - file3 prints a list of utilities specified by all three files. And the entry: example% comm -12 file2 file3 | comm -23 -file1 prints a list of utilities specified by both file2 and file3, but not specified in file1. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of comm: LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 All input files were successfully output as specified. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
cmp(1), diff(1), sort(1), uniq(1), attributes(5), environ(5), largefile(5), standards(5) SunOS 5.11 3 Mar 2004 comm(1)
All times are GMT -4. The time now is 11:28 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy