Help needed - Split large file into smaller files based on pattern match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help needed - Split large file into smaller files based on pattern match
# 1  
Old 01-18-2013
Help needed - Split large file into smaller files based on pattern match

Help needed urgently please.

I have a large file - a few hundred thousand lines.

Sample
Code:
CP START ACCOUNT
1234556
name 1
CP END ACCOUNT
CP START ACCOUNT
2224444
name 1
CP END ACCOUNT
CP START ACCOUNT
333344444
name 1
CP END ACCOUNT


I need to split this file each time "CP START ACCOUNT" is matched.

Preferably I would split it every 20 times this is matched and output to smaller files.


I was trying something like the below but could do with help obviously


Code:
awk '/START/{x="F"++i;}{print > x;}' inputfile
awk: too many output files 10
 record number 610


Also need to replace START above with CP START ACCOUNT


Can anyone help urgently?
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 01-18-2013 at 12:25 PM.. Reason: code tags, please!
# 2  
Old 01-18-2013
Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile

# 3  
Old 01-18-2013
Quote:
Originally Posted by vgersh99
Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile


Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile 
awk: too many output files 10
 record number 610

Hi - thanks for responding quickly

Tried that but same error. It creates output files from 0 to 9 with some output but fails with error above.

Any ideas?

---------- Post updated at 04:40 PM ---------- Previous update was at 04:30 PM ----------

Hi - Found some error - as I was using solaris version of awk - so now using the POSIX awk


This works
Code:
$ /usr/xpg4/bin/awk '/CP START/{close(x);x="F"++i;}{print > x;}' inputfile

however this fails:
Code:
$ /usr/xpg4/bin/awk '/CP START ACCOUNT/{close(x);x="F"++i;}{print > x;}' inputfile 
/usr/xpg4/bin/awk: line 0 (NR=1): output file "": No such file or directory


How can I get the above to succeed
Also - can I get the split to only split the file after say 20 matches of "CP START ACCOUNT" ?

Last edited by vgersh99; 01-18-2013 at 12:38 PM.. Reason: once again - please start using code tags!
# 4  
Old 01-18-2013
Code:
awk '/START/{close(x);x=("F" ++i)}{print > x;}' inputfile

if on Solaris, using nawk instead of awk
# 5  
Old 01-18-2013
Thanks, can you help on the previous comment as well ?


This works
Code:
$ /usr/xpg4/bin/awk '/CP START/{close(x);x="F"++i;}{print > x;}' inputfile

however this fails:
Code:
$ /usr/xpg4/bin/awk '/CP START ACCOUNT/{close(x);x="F"++i;}{print > x;}' inputfile 
/usr/xpg4/bin/awk: line 0 (NR=1): output file "": No such file or directory


How can I get the above to succeed
Also - can I get the split to only split the file after say 20 matches of "CP START ACCOUNT" ?

Last edited by vgersh99; 01-18-2013 at 01:00 PM.. Reason: third warning - start using code tags!
# 6  
Old 01-18-2013
Another approach is using a BASH script, but this is gonna run slower than awk:
Code:
#!/bin/bash

c=0
while read line
do
        if [[ "$line" =~ "^CP START ACCOUNT" ]]
        then
                c=$(( c + 1 ))
                echo "$line" >> F${c}.txt
        else
                echo "$line" >> F${c}.txt
        fi
done < inputfile

# 7  
Old 01-18-2013
Give this a try, extending vgersh99's proposal
Code:
awk     '/^CP START ACCOUNT/ {if (!(n%20)) {close (fn); fn=("F" ++i)}; n++}
         {print > fn;}
        ' file


Last edited by vgersh99; 01-18-2013 at 06:02 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Split large file into smaller files without disturbing the entry chunks

Dears, Need you help with the below file manipulation. I want to split the file into 8 smaller files but without cutting/disturbing the entries (meaning every small file should start with a entry and end with an empty line). It will be helpful if you can provide a one liner command for this... (12 Replies)
Discussion started by: Kamesh G
12 Replies

2. UNIX for Dummies Questions & Answers

Split large file to smaller fastly

hi , I have a requirement input file: 1 1111111111111 108 1 1111111111111 109 1 1111111111111 109 1 1111111111111 110 1 1111111111111 111 1 1111111111111 111 1 1111111111111 111 1 1111111111111 112 1 1111111111111 112 1 1111111111111 112 The output should be, (19 Replies)
Discussion started by: mechvijays
19 Replies

3. Shell Programming and Scripting

Split Large Files Based On Row Pattern..

Hi all. I've tried searching the web but could not find similar problem to mine. I have one large file to be splitted into several files based on the matching pattern found in each row. For example, let's say the file content: ... (13 Replies)
Discussion started by: aimy
13 Replies

4. UNIX for Dummies Questions & Answers

Split a huge 7 GB File Based on Pattern into 4 files

Hi, I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each. Please help me as Split command cannot work here as it might miss tags.. Format of the file is as below <!--###### ###### START-->... (6 Replies)
Discussion started by: KishM
6 Replies

5. Shell Programming and Scripting

Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . . (6 Replies)
Discussion started by: sumguy
6 Replies

6. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

7. Shell Programming and Scripting

Split large file into smaller file

hi Guys i need some help here.. i have a file which has > 800,000 lines in it. I need to split this file into smaller files with 25000 lines each. please help thanks (1 Reply)
Discussion started by: sitaldip
1 Replies

8. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

9. Shell Programming and Scripting

Split a file into multiple files based on the input pattern

I have a file with lines something like. ...... 123_start ...... ....... 123_end .... ..... 456_start ...... ..... 456_end .... ..... 789_start .... .... 789_end (6 Replies)
Discussion started by: abinash
6 Replies

10. UNIX for Dummies Questions & Answers

splitting the large file into smaller files

hi all im new to this forum..excuse me if anythng wrong. I have a file containing 600 MB data in that. when i do parse the data in perl program im getting out of memory error. so iam planning to split the file into smaller files and process one by one. can any one tell me what is the code... (1 Reply)
Discussion started by: vsnreddy
1 Replies
Login or Register to Ask a Question