Sponsored Content
Top Forums Shell Programming and Scripting Split a file based on pattern and size Post 302634421 by jl487 on Thursday 3rd of May 2012 10:21:45 AM
Old 05-03-2012
Split a file based on pattern and size

Hello, I have a large file (2GB) that I would like to split based on pattern and size.

I've used the following command to split the file (token is "HELLO")
Code:
awk '/HELLO/{i++}{print > "file"i}' input.txt

and the output is similar to the following (i included filesize in KB):
Code:
10  file1
10  file2
20  file3
18  file4
1   file5
1   file6
5   file7

I'd like to make it so that I can merge/cat the files so that if two or more files are below a limit, they get merged. So my desired output with a 20kb restriction would be:
Code:
20  file1
20  file2
20  file3
5   file4

From my desired output, files 1-2 got merged, file 3 stayed the same, file 4-6 got merged, and file 7 stayed the same because it's the remainder.

I was thinking of using my awk command first and then for a for loop to merge the files. My only issue is that since there are so many files, if i did a sort based on file name, it would go file1, file10, file100, file2, file20, etc. and i don't want to merge file1 and file101 together.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split File Based on Line Number Pattern

Hello all. Sorry, I know this question is similar to many others, but I just can seem to put together exactly what I need. My file is tab delimitted and contains approximately 1 million rows. I would like to send lines 1,4,& 7 to a file. Lines 2, 5, & 8 to a second file. Lines 3, 6, & 9 to... (11 Replies)
Discussion started by: shankster
11 Replies

2. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies

3. Shell Programming and Scripting

Split file based on size

Hi Friends, Below is my requirement. I have a file with the below structure. 0001A1.... 0001B1.. .... 0001L1 0002A1 0002B1 ...... 0002L1 .. the first 4 characters are the sequence numbers for a record, A record will start with A1 and end with L1 with same sequence number. Now the... (2 Replies)
Discussion started by: diva_thilak
2 Replies

4. Shell Programming and Scripting

Split file based on file size in Korn script

I need to split a file if it is over 2GB in size (or any size), preferably split on the lines. I have figured out how to get the file size using awk, and I can split the file based on the number of lines (which I got with wc -l) but I can't figure out how to connect them together in the script. ... (6 Replies)
Discussion started by: ssemple2000
6 Replies

5. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

6. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

7. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies

8. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies

9. Shell Programming and Scripting

Split the File based on Size

I have a file that is about 7 GB in size. The requirement is I should split the file equally in such a way that the size of the split files is less than 2Gb. If the file is less than 2gb, than nothing needs to be done. ( need to done using shell script) Thanks, (4 Replies)
Discussion started by: rudoraj
4 Replies

10. UNIX for Beginners Questions & Answers

File Size Split up based on Month

Hi, I have a directory in Unix and there are folders available in the directory. Files are created on different month and now i have a requirement to calculate size of the folder on month basis. Is there any Unix command to check this please?? Thanks (6 Replies)
Discussion started by: Nivas
6 Replies
MERGE(1)						      General Commands Manual							  MERGE(1)

NAME
merge - three-way file merge SYNOPSIS
merge [ options ] file1 file2 file3 DESCRIPTION
merge incorporates all changes that lead from file2 to file3 into file1. The result ordinarily goes into file1. merge is useful for com- bining separate changes to an original. Suppose file2 is the original, and both file1 and file3 are modifications of file2. Then merge combines both changes. A conflict occurs if both file1 and file3 have changes in a common segment of lines. If a conflict is found, merge normally outputs a warning and brackets the conflict with <<<<<<< and >>>>>>> lines. A typical conflict will look like this: <<<<<<< file A lines in file A ======= lines in file B >>>>>>> file B If there are conflicts, the user should edit the result and delete one of the alternatives. OPTIONS
-A Output conflicts using the -A style of diff3(1), if supported by diff3. This merges all changes leading from file2 to file3 into file1, and generates the most verbose output. -E, -e These options specify conflict styles that generate less information than -A. See diff3(1) for details. The default is -E. With -e, merge does not warn about conflicts. -L label This option may be given up to three times, and specifies labels to be used in place of the corresponding file names in conflict reports. That is, merge -L x -L y -L z a b c generates output that looks like it came from files x, y and z instead of from files a, b and c. -p Send results to standard output instead of overwriting file1. -q Quiet; do not warn about conflicts. -V Print 's version number. DIAGNOSTICS
Exit status is 0 for no conflicts, 1 for some conflicts, 2 for trouble. IDENTIFICATION
Author: Walter F. Tichy. Manual Page Revision: 1.1.1.1; Release Date: 2002/04/30. Copyright (C) 1982, 1988, 1989 Walter F. Tichy. Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995 Paul Eggert. SEE ALSO
diff3(1), diff(1), rcsmerge(1), co(1). BUGS
It normally does not make sense to merge binary files as if they were text, but merge tries to do it anyway. GNU
2002/04/30 MERGE(1)
All times are GMT -4. The time now is 03:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy