Sponsored Content
Top Forums Shell Programming and Scripting Split a file based on pattern and size Post 302634421 by jl487 on Thursday 3rd of May 2012 10:21:45 AM
Old 05-03-2012
Split a file based on pattern and size

Hello, I have a large file (2GB) that I would like to split based on pattern and size.

I've used the following command to split the file (token is "HELLO")
Code:
awk '/HELLO/{i++}{print > "file"i}' input.txt

and the output is similar to the following (i included filesize in KB):
Code:
10  file1
10  file2
20  file3
18  file4
1   file5
1   file6
5   file7

I'd like to make it so that I can merge/cat the files so that if two or more files are below a limit, they get merged. So my desired output with a 20kb restriction would be:
Code:
20  file1
20  file2
20  file3
5   file4

From my desired output, files 1-2 got merged, file 3 stayed the same, file 4-6 got merged, and file 7 stayed the same because it's the remainder.

I was thinking of using my awk command first and then for a for loop to merge the files. My only issue is that since there are so many files, if i did a sort based on file name, it would go file1, file10, file100, file2, file20, etc. and i don't want to merge file1 and file101 together.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split File Based on Line Number Pattern

Hello all. Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need. My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to... (11 Replies)
Discussion started by: shankster
11 Replies

2. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies

3. Shell Programming and Scripting

Split file based on size

Hi Friends, Below is my requirement. I have a file with the below structure. 0001A1.... 0001B1.. .... 0001L1 0002A1 0002B1 ...... 0002L1 .. the first 4 characters are the sequence numbers for a record, A record will start with A1 and end with L1 with same sequence number. Now the... (2 Replies)
Discussion started by: diva_thilak
2 Replies

4. Shell Programming and Scripting

Split file based on file size in Korn script

I need to split a file if it is over 2GB in size (or any size), preferably split on the lines. I have figured out how to get the file size using awk, and I can split the file based on the number of lines (which I got with wc -l) but I can't figure out how to connect them together in the script. ... (6 Replies)
Discussion started by: ssemple2000
6 Replies

5. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

7. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies

8. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

9. Shell Programming and Scripting

Split the File based on Size

I have a file that is about 7 GB in size. The requirement is I should split the file equally in such a way that the size of the split files is less than 2Gb. If the file is less than 2gb, than nothing needs to be done. ( need to done using shell script) Thanks, (4 Replies)
Discussion started by: rudoraj
4 Replies

10. UNIX for Beginners Questions & Answers

File Size Split up based on Month

Hi, I have a directory in Unix and there are folders available in the directory. Files are created on different month and now i have a requirement to calculate size of the folder on month basis. Is there any Unix command to check this please?? Thanks (6 Replies)
Discussion started by: Nivas
6 Replies
SPLIT(3)								 1								  SPLIT(3)

split - Split string into array by regular expression

SYNOPSIS
array split (string $pattern, string $string, [int $limit = -1]) DESCRIPTION
Splits a $string into array by regular expression. Warning This function has been DEPRECATED as of PHP 5.3.0. Relying on this feature is highly discouraged. PARAMETERS
o $pattern - Case sensitive regular expression. If you want to split on any of the characters which are considered special by regular expressions, you'll need to escape them first. If you think split(3) (or any other regex function, for that matter) is doing some- thing weird, please read the file regex.7, included in the regex/ subdirectory of the PHP distribution. It's in manpage format, so you'll want to do something along the lines of man /usr/local/src/regex/regex.7 in order to read it. o $string - The input string. o $limit - If $limit is set, the returned array will contain a maximum of $limit elements with the last element containing the whole rest of $string. RETURN VALUES
Returns an array of strings, each of which is a substring of $string formed by splitting it on boundaries formed by the case-sensitive regular expression $pattern. If there are n occurrences of $pattern, the returned array will contain n+1 items. For example, if there is no occurrence of $pattern, an array with only one element will be returned. Of course, this is also true if $string is empty. If an error occurs, split(3) returns FALSE. EXAMPLES
Example #1 split(3) example To split off the first four fields from a line from /etc/passwd: <?php list($user, $pass, $uid, $gid, $extra) = split(":", $passwd_line, 5); ?> Example #2 split(3) example To parse a date which may be delimited with slashes, dots, or hyphens: <?php // Delimiters may be slash, dot, or hyphen $date = "04/30/1973"; list($month, $day, $year) = split('[/.-]', $date); echo "Month: $month; Day: $day; Year: $year<br /> "; ?> NOTES
Note As of PHP 5.3.0, the regex extension is deprecated in favor of the PCRE extension. Calling this function will issue an E_DEPRECATED notice. See the list of differences for help on converting to PCRE. Tip split(3) is deprecated as of PHP 5.3.0. preg_split(3) is the suggested alternative to this function. If you don't require the power of regular expressions, it is faster to use explode(3), which doesn't incur the overhead of the regular expression engine. Tip For users looking for a way to emulate Perl's @chars = split('', $str) behaviour, please see the examples for preg_split(3) or str_split(3). SEE ALSO
preg_split(3), spliti(3), str_split(3), explode(3), implode(3), chunk_split(3), wordwrap(3). PHP Documentation Group SPLIT(3)
All times are GMT -4. The time now is 05:34 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy