Sponsored Content
Top Forums Shell Programming and Scripting Split a file based on pattern and size Post 302634421 by jl487 on Thursday 3rd of May 2012 10:21:45 AM
Old 05-03-2012
Split a file based on pattern and size

Hello, I have a large file (2GB) that I would like to split based on pattern and size.

I've used the following command to split the file (token is "HELLO")
Code:
awk '/HELLO/{i++}{print > "file"i}' input.txt

and the output is similar to the following (i included filesize in KB):
Code:
10  file1
10  file2
20  file3
18  file4
1   file5
1   file6
5   file7

I'd like to make it so that I can merge/cat the files so that if two or more files are below a limit, they get merged. So my desired output with a 20kb restriction would be:
Code:
20  file1
20  file2
20  file3
5   file4

From my desired output, files 1-2 got merged, file 3 stayed the same, file 4-6 got merged, and file 7 stayed the same because it's the remainder.

I was thinking of using my awk command first and then for a for loop to merge the files. My only issue is that since there are so many files, if i did a sort based on file name, it would go file1, file10, file100, file2, file20, etc. and i don't want to merge file1 and file101 together.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split File Based on Line Number Pattern

Hello all. Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need. My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to... (11 Replies)
Discussion started by: shankster
11 Replies

2. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies

3. Shell Programming and Scripting

Split file based on size

Hi Friends, Below is my requirement. I have a file with the below structure. 0001A1.... 0001B1.. .... 0001L1 0002A1 0002B1 ...... 0002L1 .. the first 4 characters are the sequence numbers for a record, A record will start with A1 and end with L1 with same sequence number. Now the... (2 Replies)
Discussion started by: diva_thilak
2 Replies

4. Shell Programming and Scripting

Split file based on file size in Korn script

I need to split a file if it is over 2GB in size (or any size), preferably split on the lines. I have figured out how to get the file size using awk, and I can split the file based on the number of lines (which I got with wc -l) but I can't figure out how to connect them together in the script. ... (6 Replies)
Discussion started by: ssemple2000
6 Replies

5. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

7. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies

8. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

9. Shell Programming and Scripting

Split the File based on Size

I have a file that is about 7 GB in size. The requirement is I should split the file equally in such a way that the size of the split files is less than 2Gb. If the file is less than 2gb, than nothing needs to be done. ( need to done using shell script) Thanks, (4 Replies)
Discussion started by: rudoraj
4 Replies

10. UNIX for Beginners Questions & Answers

File Size Split up based on Month

Hi, I have a directory in Unix and there are folders available in the directory. Files are created on different month and now i have a requirement to calculate size of the folder on month basis. Is there any Unix command to check this please?? Thanks (6 Replies)
Discussion started by: Nivas
6 Replies
CSPLIT(1)						    BSD General Commands Manual 						 CSPLIT(1)

NAME
csplit -- split files based on context SYNOPSIS
csplit [-ks] [-f prefix] [-n number] file args ... DESCRIPTION
The csplit utility splits file into pieces using the patterns args. If file is a dash ('-'), csplit reads from standard input. The options are as follows: -f prefix Give created files names beginning with prefix. The default is ``xx''. -k Do not remove output files if an error occurs or a HUP, INT or TERM signal is received. -n number Use number of decimal digits after the prefix to form the file name. The default is 2. -s Do not write the size of each output file to standard output as it is created. The args operands may be a combination of the following patterns: /regexp/[[+|-]offset] Create a file containing the input from the current line to (but not including) the next line matching the given basic regular expression. An optional offset from the line that matched may be specified. %regexp%[[+|-]offset] Same as above but a file is not created for the output. line_no Create containing the input from the current line to (but not including) the specified line number. {num} Repeat the previous pattern the specified number of times. If it follows a line number pattern, a new file will be created for each line_no lines, num times. The first line of the file is line number 1 for historic reasons. After all the patterns have been processed, the remaining input data (if there is any) will be written to a new file. Requesting to split at a line before the current line number or past the end of the file will result in an error. ENVIRONMENT
The LANG, LC_ALL, LC_COLLATE and LC_CTYPE environment variables affect the execution of csplit as described in environ(7). EXIT STATUS
The csplit utility exits 0 on success, and >0 if an error occurs. EXAMPLES
Split the mdoc(7) file foo.1 into one file for each section (up to 20): csplit -k foo.1 '%^.Sh%' '/^.Sh/' '{20}' Split standard input after the first 99 lines and every 100 lines thereafter: csplit -k - 100 '{19}' SEE ALSO
sed(1), split(1), re_format(7) STANDARDS
The csplit utility conforms to IEEE Std 1003.1-2001 (``POSIX.1''). HISTORY
A csplit command appeared in PWB UNIX. BUGS
Input lines are limited to LINE_MAX (2048) bytes in length. BSD
January 26, 2005 BSD
All times are GMT -4. The time now is 10:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy