awk - splitting 1 large file into multiple based on same key records Post: 302489037

Sponsored Content

Top Forums Shell Programming and Scripting awk - splitting 1 large file into multiple based on same key records Post 302489037 by kam66 on Wednesday 19th of January 2011 09:40:47 AM

01-19-2011

Registered User

Hello Guru and rdcwayx,

Thanks for the solutions but it doesn't fulfil my requirement. As I mentioned my data file contains approx 4 million records and I want to create an output file of 500,000 recs each naming like file1, file2....file10.
While spliting a file when 500,000 rec mark is reached, I want to make sure that I am not spliting records of same key e.g.(100) across 2 output files so I want to keep all same key records in same output file, can be file1 or file2 doesn't matter.

Not very neat coding but I was able to split on every 500,000 recs by following code but keeping same key records is a challenge for me.

Code:

awk ' {
FS="~";
a=$2;
echo a;
          if(NR<500000) { print $0 > "file1"}
         if (NR>500000 && NR <= 1000000) { print $0 > "file2" }
          if (NR>1000000 && NR <= 1500000) {print $0 > "file3" }
           if (NR>1500000 && NR <= 2000000) {print $0 > "file4" }
            if (NR>2000000 && NR <= 2500000) {print $0 > "file5" }
             if (NR>2500000 && NR <= 3000000) {print $0 > "file6" }
              if (NR>3000000 && NR <= 3500000) {print $0 > "file7" }
               if (NR>3500000 && NR <= 4000000) {print $0 > "file8" }
                if (NR>4000000 && NR <= 4500000) {print $0 > "file9" }
                 if (NR>4500000 && NR <= 5000000) {print $0 > "file10" }
       }'  CF_SEQ.srt

Best regards,
K

Last edited by Scott; 01-19-2011 at 11:06 AM.. Reason: Please use code tags

kam66

View Public Profile for kam66

Find all posts by kam66

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a file based on the records in another file

All, We receive a file with a large no of records (records can vary) and we have to split it into two files based on another file. e.g. File1: UHDR 2008112 "25187","00000022","00",21-APR-1991,"" ,"D",-000000519,+0000000000,"C", ,+000000000,+000000000,000000000,"2","" ...

2. Shell Programming and Scripting

How to delete duplicate records based on key

For example suppose I have a file which contains data as: $cat data 800,2 100,9 700,3 100,9 200,8 100,3 Now I want the output as 200,8 700,3 800,2 Key is first three characters, I don't want any reords which are having duplicate keys. Like sort +0.0 -0.3 data can we use...

3. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345...

4. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second.....

5. Shell Programming and Scripting

Splitting record into multiple records by appending values from an input field (AWK)

Hello, For the input file, I am trying to split those records which have multiple values seperated by '|' in the last input field, into multiple records and each record corresponds to the common input fields + one of the value from the last field. I was trying with an example on this forum...

6. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ...

7. Shell Programming and Scripting

Splitting records in a text file based on delimiter

A text file has 2 fields (Data, Filename) delimited by # as below, Data,Filename Row1 -> abc#Test1.xml Row2 -> xyz#Test2.xml Row3 -> ghi#Test3.xml The content in first field has to be written into a file where filename should be considered from second field. So from...

8. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . .

9. Shell Programming and Scripting

Help with Splitting a Large XML file based on size AND tags

Hi All, This is my first post here. Hoping to share and gain knowledge from this great forum !!!! I've scanned this forum before posting my problem here, but I'm afraid I couldn't find any thread that addresses this exact problem. I'm trying to split a large XML file (with multiple tag...

10. Shell Programming and Scripting

Script for splitting file of records into multiple files

Hello I have a file of following format HDR 1234 abc qwerty abc def ghi jkl HDR 4567 xyz qwerty abc def ghi jkl HDR 890 mno qwerty abc def ghi jkl HDR 1234 abc qwerty abc def ghi jkl HDR 1234 abc qwerty abc def ghi jkl -Need to split this into multiple files based on tag...

LEARN ABOUT OSF1

merge

merge(1)																  merge(1)

NAME

       merge - three-way file merge

SYNOPSIS

       merge [-Llabel1 [-Llabel3]] [-p] [-q] file1 file2 file3

DESCRIPTION

       merge  incorporates  all  changes that lead from file2 to file3 into file1. The result goes to standard output if -p is present, into file1
       otherwise.  merge is useful for combining separate changes to an original.  Suppose file2 is the original, and both  file1  and	file3  are
       modifications of file2. Then merge combines both changes.

       An  overlap occurs if both file1 and file3 have changes in a common segment of lines. On a few older hosts where diff3 does not support the
       -E option, merge does not detect overlaps, and merely supplies the changed lines from file3.  On most hosts, if overlaps occur, merge  out-
       puts a message (unless the -q option is given), and includes both alternatives in the result.  The alternatives are delimited as follows:

       <<<<<<< file1 lines in file1 ======= lines in file3 >>>>>>> file3

       If  there  are  overlaps,  the  user  should edit the result and delete one of the alternatives. If the -L label1 and -L label3 options are
       given, the labels are output in place of the names file1 and file3 in overlap reports.

DIAGNOSTICS

       Exit status is 0 for no overlaps, 1 for some overlaps, 2 for trouble.

IDENTIFICATION

       Author: Walter F. Tichy.
       Revision Number: 1.1.6.2; Release Date: 1993/10/07.
       Copyright (C) 1982, 1988, 1989 by Walter F. Tichy.
       Copyright (C) 1990, 1991 by Paul Eggert.

SEE ALSO

       diff3(1), diff(1), rcsmerge(1), co(1)

																	  merge(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Splitting a file based on the records in another file

Discussion started by: er_ashu

2. Shell Programming and Scripting

How to delete duplicate records based on key

Discussion started by: sumitc

3. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

Discussion started by: jimmy12

4. Shell Programming and Scripting

Problem with splitting large file based on pattern

Discussion started by: saisanthi