Split file using awk based on multiple conditions | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Split file using awk based on multiple conditions

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 02-12-2013
lurkerro lurkerro is offline
Registered User
 
Join Date: Feb 2013
Last Activity: 15 May 2014, 4:32 AM EDT
Posts: 7
Thanks: 0
Thanked 0 Times in 0 Posts
Tools Split file using awk based on multiple conditions

Hi guys,
I'm been struggling for some time to split a big file into smaller ones based on some conditions without any luck so far. I've been able to complete this with a small shell script with loops but it takes around 1h for a file with 600k records . I'm sure there is a simpler solution with awk.

The file I'm trying to split (comma separated file):
field1, ..., ..., ...
field1, ..., ..., ...
.....
field1, ..., ..., ...
field2, ..., ..., ...
field2, ..., ..., ...
.....
field2, ..., ..., ...
.....
fieldn, ..., ..., ...

What I need is to split this file into multiple files (with name F1, F2, etc) normally with no more than 2k records in each file. However, the first field values should not span into separate files. For example if in the record 2001 the first field is the same as in the records 2000 it should go also in the first file, a new file should be created only when the value of the field changes.

Appreciate all your help.
Sponsored Links
    #2  
Old 02-12-2013
elixir_sinari's Avatar
elixir_sinari elixir_sinari is offline Forum Advisor  
Registered User
 
Join Date: Mar 2012
Last Activity: 9 October 2014, 4:50 PM EDT
Location: India
Posts: 1,412
Thanks: 101
Thanked 496 Times in 473 Posts
Like this?

Code:
awk -F, '{print > "File_" $1}' file

Sponsored Links
    #3  
Old 02-12-2013
lurkerro lurkerro is offline
Registered User
 
Join Date: Feb 2013
Last Activity: 15 May 2014, 4:32 AM EDT
Posts: 7
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks for the reply.
This script will split the file into multiple files each one containing the distinct values of first field. I would like if the output file does not contain 2000 lines to keep adding also in the same file other values of the first field also.
    #4  
Old 02-12-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 26 November 2014, 9:45 AM EST
Location: San Jose, CA, USA
Posts: 5,097
Thanks: 196
Thanked 1,705 Times in 1,448 Posts
Try something like:

Code:
awk 'BEGIN {
        FS = OFS = ","
        fn = "F" ++fc
}
c >= 2000 && $1 != last {
        close fn
        fn = "F" ++fc
        c = 0
}
{       print > fn
        last = $1
        c++
}' input

If you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk , instead of awk .
Sponsored Links
    #5  
Old 02-12-2013
lurkerro lurkerro is offline
Registered User
 
Join Date: Feb 2013
Last Activity: 15 May 2014, 4:32 AM EDT
Posts: 7
Thanks: 0
Thanked 0 Times in 0 Posts
I'm getting an error I think at close fn:
awk: cmd. line:5: close fn
awk: cmd. line:5: ^ syntax error
Sponsored Links
    #6  
Old 02-12-2013
Don Cragun's Avatar
Don Cragun Don Cragun is online now Forum Staff  
Moderator
 
Join Date: Jul 2012
Last Activity: 26 November 2014, 9:45 AM EST
Location: San Jose, CA, USA
Posts: 5,097
Thanks: 196
Thanked 1,705 Times in 1,448 Posts
Quote:
Originally Posted by lurkerro View Post
I'm getting an error I think at close fn:
awk: cmd. line:5: close fn
awk: cmd. line:5: ^ syntax error
Sorry about that. Try close(fn) instead of close fn . The awk I'm using on OS X accepts either form, but the standards only require it to work when the parentheses are provided.
Sponsored Links
    #7  
Old 02-12-2013
elixir_sinari's Avatar
elixir_sinari elixir_sinari is offline Forum Advisor  
Registered User
 
Join Date: Mar 2012
Last Activity: 9 October 2014, 4:50 PM EDT
Location: India
Posts: 1,412
Thanks: 101
Thanked 496 Times in 473 Posts

Code:
awk -F, 'c[i]>=2 && p1!=$1{i++}
{print > "File_" $1;c[i]++}
{p1=$1}' i=1 file

The solution provided by Don is better as it ensures that at any point of time, only 1 file remains open. That way, you'll never cross the max. open files limit.

Last edited by elixir_sinari; 02-12-2013 at 05:35 AM..
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Split file into multiple files based on first column ysrini Shell Programming and Scripting 3 12-04-2012 07:43 PM
How to split file into multiple files using awk based on 1 field in the file? alexyyw Shell Programming and Scripting 2 07-19-2012 05:10 AM
Split a file into multiple files based on field value manasvi24 Shell Programming and Scripting 1 06-02-2012 10:17 AM
split XML file into multiple files based on pattern chiru_h Shell Programming and Scripting 3 01-10-2012 05:17 PM
Split a file into multiple files based on the input pattern abinash Shell Programming and Scripting 6 01-16-2011 03:45 PM



All times are GMT -4. The time now is 10:58 AM.