Help needed - Split large file into smaller files based on pattern match


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Old 01-18-2013
Help needed - Split large file into smaller files based on pattern match

Help needed urgently please.

I have a large file - a few hundred thousand lines.

Sample
Code:
CP START ACCOUNT
1234556
name 1
CP END ACCOUNT
CP START ACCOUNT
2224444
name 1
CP END ACCOUNT
CP START ACCOUNT
333344444
name 1
CP END ACCOUNT


I need to split this file each time "CP START ACCOUNT" is matched.

Preferably I would split it every 20 times this is matched and output to smaller files.


I was trying something like the below but could do with help obviously


Code:
awk '/START/{x="F"++i;}{print > x;}' inputfile
awk: too many output files 10
 record number 610


Also need to replace START above with CP START ACCOUNT


Can anyone help urgently?
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 01-18-2013 at 12:25 PM.. Reason: code tags, please!
# 2  
Old 01-18-2013
Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile

# 3  
Old 01-18-2013
Quote:
Originally Posted by vgersh99
Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile


Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile 
awk: too many output files 10
 record number 610

Hi - thanks for responding quickly

Tried that but same error. It creates output files from 0 to 9 with some output but fails with error above.

Any ideas?

---------- Post updated at 04:40 PM ---------- Previous update was at 04:30 PM ----------

Hi - Found some error - as I was using solaris version of awk - so now using the POSIX awk


This works
Code:
$ /usr/xpg4/bin/awk '/CP START/{close(x);x="F"++i;}{print > x;}' inputfile

however this fails:
Code:
$ /usr/xpg4/bin/awk '/CP START ACCOUNT/{close(x);x="F"++i;}{print > x;}' inputfile 
/usr/xpg4/bin/awk: line 0 (NR=1): output file "": No such file or directory


How can I get the above to succeed
Also - can I get the split to only split the file after say 20 matches of "CP START ACCOUNT" ?

Last edited by vgersh99; 01-18-2013 at 12:38 PM.. Reason: once again - please start using code tags!
# 4  
Old 01-18-2013
Code:
awk '/START/{close(x);x=("F" ++i)}{print > x;}' inputfile

if on Solaris, using nawk instead of awk
# 5  
Old 01-18-2013
Thanks, can you help on the previous comment as well ?


This works
Code:
$ /usr/xpg4/bin/awk '/CP START/{close(x);x="F"++i;}{print > x;}' inputfile

however this fails:
Code:
$ /usr/xpg4/bin/awk '/CP START ACCOUNT/{close(x);x="F"++i;}{print > x;}' inputfile 
/usr/xpg4/bin/awk: line 0 (NR=1): output file "": No such file or directory


How can I get the above to succeed
Also - can I get the split to only split the file after say 20 matches of "CP START ACCOUNT" ?

Last edited by vgersh99; 01-18-2013 at 01:00 PM.. Reason: third warning - start using code tags!
# 6  
Old 01-18-2013
Another approach is using a BASH script, but this is gonna run slower than awk:
Code:
#!/bin/bash

c=0
while read line
do
        if [[ "$line" =~ "^CP START ACCOUNT" ]]
        then
                c=$(( c + 1 ))
                echo "$line" >> F${c}.txt
        else
                echo "$line" >> F${c}.txt
        fi
done < inputfile

# 7  
Old 01-18-2013
Give this a try, extending vgersh99's proposal
Code:
awk     '/^CP START ACCOUNT/ {if (!(n%20)) {close (fn); fn=("F" ++i)}; n++}
         {print > fn;}
        ' file


Last edited by vgersh99; 01-18-2013 at 06:02 PM..
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Split large file into smaller files without disturbing the entry chunks Kamesh G UNIX for Beginners Questions & Answers 12 05-10-2018 05:39 AM
Split files into smaller ones with 1000 hierarchies in a single file. kcdg859 UNIX for Dummies Questions & Answers 6 10-15-2014 06:39 AM
Split large file to smaller fastly mechvijays UNIX for Dummies Questions & Answers 19 09-23-2014 04:29 AM
Split Large Files Based On Row Pattern.. aimy Shell Programming and Scripting 13 11-26-2013 04:56 AM
Split a huge 7 GB File Based on Pattern into 4 files KishM UNIX for Dummies Questions & Answers 6 07-25-2013 09:18 AM
Sed: Splitting A large File into smaller files based on recursive Regular Expression match sumguy Shell Programming and Scripting 6 04-02-2013 09:39 PM
split XML file into multiple files based on pattern chiru_h Shell Programming and Scripting 3 01-10-2012 05:17 PM
How to split a file into smaller files wintersnow2011 Shell Programming and Scripting 2 12-08-2011 03:58 PM
Split large file into smaller file sitaldip Shell Programming and Scripting 1 08-05-2011 05:59 AM
Splitting large file into multiple files in unix based on pattern jimmy12 Shell Programming and Scripting 19 07-06-2011 04:14 AM
Split a file into multiple files based on the input pattern abinash Shell Programming and Scripting 6 01-16-2011 03:45 PM
Split large file based on last digit from a column alain.kazan Shell Programming and Scripting 9 05-17-2010 11:38 AM
multiple smaller files from one large file rtroscianecki UNIX for Dummies Questions & Answers 2 07-15-2009 11:25 PM
split large file based on field criteria asriva Shell Programming and Scripting 6 06-22-2009 11:41 AM
splitting the large file into smaller files vsnreddy UNIX for Dummies Questions & Answers 1 11-16-2008 09:09 PM