|
|||||||
| Forums | Search Forums | Register | Forum Rules | Man Pages | Albums | FAQ | Members | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
|
|
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Hi guys,
I'm been struggling for some time to split a big file into smaller ones based on some conditions without any luck so far. I've been able to complete this with a small shell script with loops but it takes around 1h for a file with 600k records . I'm sure there is a simpler solution with awk.The file I'm trying to split (comma separated file): field1, ..., ..., ... field1, ..., ..., ... ..... field1, ..., ..., ... field2, ..., ..., ... field2, ..., ..., ... ..... field2, ..., ..., ... ..... fieldn, ..., ..., ... What I need is to split this file into multiple files (with name F1, F2, etc) normally with no more than 2k records in each file. However, the first field values should not span into separate files. For example if in the record 2001 the first field is the same as in the records 2000 it should go also in the first file, a new file should be created only when the value of the field changes. Appreciate all your help. |
| Sponsored Links | ||
|
|
#2
|
||||
|
||||
|
Like this? Code:
awk -F, '{print > "File_" $1}' file |
| Sponsored Links | ||
|
|
#3
|
|||
|
|||
|
Thanks for the reply.
This script will split the file into multiple files each one containing the distinct values of first field. I would like if the output file does not contain 2000 lines to keep adding also in the same file other values of the first field also. |
|
#4
|
|||
|
|||
|
Try something like: Code:
awk 'BEGIN {
FS = OFS = ","
fn = "F" ++fc
}
c >= 2000 && $1 != last {
close fn
fn = "F" ++fc
c = 0
}
{ print > fn
last = $1
c++
}' inputIf you are using a Solaris/SunOS system, use /usr/xpg4/bin/awk or nawk , instead of awk . |
| Sponsored Links | |
|
|
#5
|
|||
|
|||
|
I'm getting an error I think at close fn:
awk: cmd. line:5: close fn awk: cmd. line:5: ^ syntax error |
| Sponsored Links | |
|
|
#6
|
|||
|
|||
|
Quote:
|
| Sponsored Links | |
|
|
#7
|
||||
|
||||
|
Code:
awk -F, 'c[i]>=2 && p1!=$1{i++}
{print > "File_" $1;c[i]++}
{p1=$1}' i=1 fileThe solution provided by Don is better as it ensures that at any point of time, only 1 file remains open. That way, you'll never cross the max. open files limit. Last edited by elixir_sinari; 02-12-2013 at 04:35 AM.. |
| Sponsored Links | ||
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| [SOLVED] Split file into multiple files based on first column | ysrini | Shell Programming and Scripting | 3 | 12-04-2012 06:43 PM |
| How to split file into multiple files using awk based on 1 field in the file? | alexyyw | Shell Programming and Scripting | 2 | 07-19-2012 04:10 AM |
| Split a file into multiple files based on field value | manasvi24 | Shell Programming and Scripting | 1 | 06-02-2012 09:17 AM |
| split XML file into multiple files based on pattern | chiru_h | Shell Programming and Scripting | 3 | 01-10-2012 04:17 PM |
| Split a file into multiple files based on the input pattern | abinash | Shell Programming and Scripting | 6 | 01-16-2011 02:45 PM |
|
|