Sponsored Content
Full Discussion: Split a large file
Top Forums Shell Programming and Scripting Split a large file Post 302444426 by alister on Wednesday 11th of August 2010 11:17:34 PM
Old 08-12-2010
commasplit.awk:
Code:
BEGIN {
    RS = ","
    max_size = 100*2^20
}

function open_file() {
    len = 0
    fn = "file" ++i
    printf("v=[") > fn
}

function close_file() {
    printf("];") > fn
    close(fn)
}

NR == 1 {
    open_file()
}

len >= max_size {
    close_file()
    open_file()
}

{
    s = (len?",":"") $0
    printf("%s", s) > fn
    len += length(s)
}

END {
   close_file()
}

Invocation:
Code:
awk -f commasplit.awk  datafile


max_size is not a hard limit. The file size may be a bit larger, as much as max_size + length of header (3) and footer (2) + one field length - 1. If the field's value were "125000" (6), we'd be talking about a file size of 100 MiB + 10.

Regards,
Alister
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split A Large File

Hi, I have a large file(csv format) that I need to split into 2 files. The file looks something like Original_file.txt first name, family name, address a, b, c, d, e, f, and so on for over 100,00 lines I need to create two files from this one file. The condition is i need to ensure... (4 Replies)
Discussion started by: nbvcxzdz
4 Replies

2. Shell Programming and Scripting

Split a large file with patterns and size

Hi, I have a large file with a repeating pattern in it. Now i want the file split into the block of patterns with a specified no. of lines in each file. i.e. The file is like 1... 2... 2... 3... 1... 2... 3... 1... 2... 2... 2... 2... 2... 3... where 1 is the start of the block... (5 Replies)
Discussion started by: sudhamacs
5 Replies

3. Shell Programming and Scripting

Split Large File

HI, i've to split a large file which inputs seems like : Input file name_file.txt 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00001|AAAA|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00002|BBBB|MAIL|DATEOFBIRTHT|....... 00003|CCCC|MAIL|DATEOFBIRTHT|.......... (1 Reply)
Discussion started by: AMARA
1 Replies

4. Shell Programming and Scripting

split large file based on field criteria

I have a file containing date/time sorted data of the form ... 2009/06/10,20:59:59.950,XAG/USD,Q,1,1115, 14.3025,100,1,1 2009/06/10,20:59:59.950,XAG/USD,Q,1,1116, 14.3026,125,1,1 2009/06/10,20:59:59.950,XAG/USD,R,0,0, , 0,0,0 2009/06/10,20:59:59.950,XAG/USD,R,1,0, 14.1910,100,1,1... (6 Replies)
Discussion started by: asriva
6 Replies

5. Shell Programming and Scripting

Splitting a large file, split command will not do.

Hello Everyone, I have a large file that needs to be split into many seperate files, however the text in between the blank lines need to be intact. The file looks like SomeText SomeText SomeText SomeOtherText SomeOtherText .... Since the number of lines of text are different for... (3 Replies)
Discussion started by: jwillis0720
3 Replies

6. Shell Programming and Scripting

Split large file based on last digit from a column

Hello, What's the best way to split a large into multiple files based on the last digit in the first column. input file: f 2738483300000x0y03772748378831x1y13478378358383x2y23743878383802x3y33787828282820x4y43748838383881x5y5 Desired Output: f0 3738483300000x0y03787828282820x4y4 f1... (9 Replies)
Discussion started by: alain.kazan
9 Replies

7. UNIX for Dummies Questions & Answers

Split large file to smaller fastly

hi , I have a requirement input file: 1 1111111111111 108 1 1111111111111 109 1 1111111111111 109 1 1111111111111 110 1 1111111111111 111 1 1111111111111 111 1 1111111111111 111 1 1111111111111 112 1 1111111111111 112 1 1111111111111 112 The output should be, (19 Replies)
Discussion started by: mechvijays
19 Replies

8. UNIX for Beginners Questions & Answers

sed awk: split a large file to unique file names

Dear Users, Appreciate your help if you could help me with splitting a large file > 1 million lines with sed or awk. below is the text in the file input file.txt scaffold1 928 929 C/T + scaffold1 942 943 G/C + scaffold1 959 960 C/T +... (6 Replies)
Discussion started by: kapr0001
6 Replies

9. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

10. UNIX for Beginners Questions & Answers

Trying To Split a Large File

Trying to split a 35gb file into 1000mb parts. My research shows I should you this. split -b 1000m file.txt and my return is "split: cannot open 'crunch1.txt' for reading: No such file or directory" so I tried split -b 1000m Documents/Wordlists/file.txt and I get nothing other than the curser just... (3 Replies)
Discussion started by: sub terra
3 Replies
Cache::SizeAwareCache(3pm)				User Contributed Perl Documentation				Cache::SizeAwareCache(3pm)

NAME
Cache::SizeAwareCache -- extends the Cache interface. DESCRIPTION
The SizeAwareCache interface is implemented by classes that support all of the Cache::Cache interface in addition to the limit_size and max_size features of a size aware cache. The default cache size limiting algorithm works by removing cache objects in the following order until the desired limit is reached: 1) objects that have expired 2) objects that are least recently accessed 3) objects that that expire next SYNOPSIS
use Cache::SizeAwareCache; use vars qw( @ISA ); @ISA = qw( Cache::SizeAwareCache ); CONSTANTS
Please see Cache::Cache for standard constants $NO_MAX_SIZE The cache has no size restrictions METHODS
Please see Cache::Cache for the standard methods limit_size( $new_size ) Attempt to resize the cache such that the total disk usage is under the $new_size parameter. $new_size represents t size (in bytes) that the cache should be limited to. Note that this is only a one time adjustment. To maintain the cache size, consider using the max_size option, although it is considered very expensive, and can often be better achieved by peridocally calling limit_size. OPTIONS
Please see Cache::Cache for the standard options max_size Sets the max_size property (size in bytes), which is described in detail below. Defaults to $NO_MAX_SIZE. PROPERTIES
Please see Cache::Cache for standard properties (get|set)_max_size If this property is set, then the cache will try not to exceed the max size value (in bytes) specified. NOTE: This causes the size of the cache to be checked on every set, and can be considered *very* expensive in some implementations. A good alternative approach is leave max_size as $NO_MAX_SIZE and to periodically limit the size of the cache by calling the limit_size( $size ) method. SEE ALSO
Cache::Cache AUTHOR
Original author: DeWitt Clinton <dewitt@unto.net> Last author: $Author: dclinton $ Copyright (C) 2001-2003 DeWitt Clinton perl v5.12.4 2009-03-01 Cache::SizeAwareCache(3pm)
All times are GMT -4. The time now is 07:00 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy