Help needed - Split large file into smaller files based on pattern match | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Help needed - Split large file into smaller files based on pattern match

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 01-18-2013
frustrated1 frustrated1 is offline
Registered User
 
Join Date: Aug 2003
Last Activity: 5 September 2014, 6:44 AM EDT
Location: Ireland
Posts: 296
Thanks: 2
Thanked 1 Time in 1 Post
Help needed - Split large file into smaller files based on pattern match

Help needed urgently please.

I have a large file - a few hundred thousand lines.

Sample

Code:
CP START ACCOUNT
1234556
name 1
CP END ACCOUNT
CP START ACCOUNT
2224444
name 1
CP END ACCOUNT
CP START ACCOUNT
333344444
name 1
CP END ACCOUNT


I need to split this file each time "CP START ACCOUNT" is matched.

Preferably I would split it every 20 times this is matched and output to smaller files.


I was trying something like the below but could do with help obviously



Code:
awk '/START/{x="F"++i;}{print > x;}' inputfile
awk: too many output files 10
 record number 610


Also need to replace START above with CP START ACCOUNT


Can anyone help urgently?
Moderator's Comments:
Please use code tags when posting data and code samples!

Last edited by vgersh99; 01-18-2013 at 11:25 AM.. Reason: code tags, please!
Sponsored Links
    #2  
Old 01-18-2013
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Advisor  
Forum Advisor
 
Join Date: Feb 2005
Last Activity: 24 October 2014, 7:17 PM EDT
Location: Foxborough, MA
Posts: 7,667
Thanks: 152
Thanked 586 Times in 551 Posts

Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile

Sponsored Links
    #3  
Old 01-18-2013
frustrated1 frustrated1 is offline
Registered User
 
Join Date: Aug 2003
Last Activity: 5 September 2014, 6:44 AM EDT
Location: Ireland
Posts: 296
Thanks: 2
Thanked 1 Time in 1 Post
Quote:
Originally Posted by vgersh99 View Post
Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile



Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile 
awk: too many output files 10
 record number 610

Hi - thanks for responding quickly

Tried that but same error. It creates output files from 0 to 9 with some output but fails with error above.

Any ideas?

---------- Post updated at 04:40 PM ---------- Previous update was at 04:30 PM ----------

Hi - Found some error - as I was using solaris version of awk - so now using the POSIX awk


This works

Code:
$ /usr/xpg4/bin/awk '/CP START/{close(x);x="F"++i;}{print > x;}' inputfile

however this fails:

Code:
$ /usr/xpg4/bin/awk '/CP START ACCOUNT/{close(x);x="F"++i;}{print > x;}' inputfile 
/usr/xpg4/bin/awk: line 0 (NR=1): output file "": No such file or directory


How can I get the above to succeed
Also - can I get the split to only split the file after say 20 matches of "CP START ACCOUNT" ?

Last edited by vgersh99; 01-18-2013 at 11:38 AM.. Reason: once again - please start using code tags!
    #4  
Old 01-18-2013
vgersh99's Avatar
vgersh99 vgersh99 is offline Forum Advisor  
Forum Advisor
 
Join Date: Feb 2005
Last Activity: 24 October 2014, 7:17 PM EDT
Location: Foxborough, MA
Posts: 7,667
Thanks: 152
Thanked 586 Times in 551 Posts

Code:
awk '/START/{close(x);x=("F" ++i)}{print > x;}' inputfile

if on Solaris, using nawk instead of awk
Sponsored Links
    #5  
Old 01-18-2013
frustrated1 frustrated1 is offline
Registered User
 
Join Date: Aug 2003
Last Activity: 5 September 2014, 6:44 AM EDT
Location: Ireland
Posts: 296
Thanks: 2
Thanked 1 Time in 1 Post
Thanks, can you help on the previous comment as well ?


This works

Code:
$ /usr/xpg4/bin/awk '/CP START/{close(x);x="F"++i;}{print > x;}' inputfile

however this fails:

Code:
$ /usr/xpg4/bin/awk '/CP START ACCOUNT/{close(x);x="F"++i;}{print > x;}' inputfile 
/usr/xpg4/bin/awk: line 0 (NR=1): output file "": No such file or directory


How can I get the above to succeed
Also - can I get the split to only split the file after say 20 matches of "CP START ACCOUNT" ?

Last edited by vgersh99; 01-18-2013 at 12:00 PM.. Reason: third warning - start using code tags!
Sponsored Links
    #6  
Old 01-18-2013
Yoda's Avatar
Yoda Yoda is offline Forum Advisor  
Jedi Master
 
Join Date: Jan 2012
Last Activity: 22 October 2014, 8:12 PM EDT
Location: Galactic Empire
Posts: 3,387
Thanks: 235
Thanked 1,209 Times in 1,135 Posts
Another approach is using a BASH script, but this is gonna run slower than awk:

Code:
#!/bin/bash

c=0
while read line
do
        if [[ "$line" =~ "^CP START ACCOUNT" ]]
        then
                c=$(( c + 1 ))
                echo "$line" >> F${c}.txt
        else
                echo "$line" >> F${c}.txt
        fi
done < inputfile

Sponsored Links
    #7  
Old 01-18-2013
RudiC RudiC is offline Forum Advisor  
Registered User
 
Join Date: Jul 2012
Last Activity: 24 October 2014, 3:30 PM EDT
Location: Aachen, Germany
Posts: 4,425
Thanks: 73
Thanked 1,082 Times in 1,020 Posts
Give this a try, extending vgersh99's proposal
Code:
awk     '/^CP START ACCOUNT/ {if (!(n%20)) {close (fn); fn=("F" ++i)}; n++}
         {print > fn;}
        ' file


Last edited by vgersh99; 01-18-2013 at 05:02 PM..
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
split XML file into multiple files based on pattern chiru_h Shell Programming and Scripting 3 01-10-2012 04:17 PM
Split large file into smaller file sitaldip Shell Programming and Scripting 1 08-05-2011 04:59 AM
Splitting large file into multiple files in unix based on pattern jimmy12 Shell Programming and Scripting 19 07-06-2011 03:14 AM
Split a file into multiple files based on the input pattern abinash Shell Programming and Scripting 6 01-16-2011 02:45 PM
splitting the large file into smaller files vsnreddy UNIX for Dummies Questions & Answers 1 11-16-2008 08:09 PM



All times are GMT -4. The time now is 07:55 PM.