Today (Saturday) We will make some minor tuning adjustments to MySQL.

You may experience 2 up to 10 seconds "glitch time" when we restart MySQL. We expect to make these adjustments around 1AM Eastern Daylight Saving Time (EDT) US.


Help needed - Split large file into smaller files based on pattern match


Login or Register to Reply

 
Thread Tools Search this Thread
# 1  
Help needed - Split large file into smaller files based on pattern match

Help needed urgently please.

I have a large file - a few hundred thousand lines.

Sample
Code:
CP START ACCOUNT
1234556
name 1
CP END ACCOUNT
CP START ACCOUNT
2224444
name 1
CP END ACCOUNT
CP START ACCOUNT
333344444
name 1
CP END ACCOUNT


I need to split this file each time "CP START ACCOUNT" is matched.

Preferably I would split it every 20 times this is matched and output to smaller files.


I was trying something like the below but could do with help obviously


Code:
awk '/START/{x="F"++i;}{print > x;}' inputfile
awk: too many output files 10
 record number 610


Also need to replace START above with CP START ACCOUNT


Can anyone help urgently?
Moderator's Comments:
Mod Comment
Please use code tags when posting data and code samples!

Last edited by vgersh99; 01-18-2013 at 12:25 PM.. Reason: code tags, please!
# 2  
Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile

# 3  
Quote:
Originally Posted by vgersh99
Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile


Code:
awk '/START/{close(x);x="F"++i;}{print > x;}' inputfile 
awk: too many output files 10
 record number 610

Hi - thanks for responding quickly

Tried that but same error. It creates output files from 0 to 9 with some output but fails with error above.

Any ideas?

---------- Post updated at 04:40 PM ---------- Previous update was at 04:30 PM ----------

Hi - Found some error - as I was using solaris version of awk - so now using the POSIX awk


This works
Code:
$ /usr/xpg4/bin/awk '/CP START/{close(x);x="F"++i;}{print > x;}' inputfile

however this fails:
Code:
$ /usr/xpg4/bin/awk '/CP START ACCOUNT/{close(x);x="F"++i;}{print > x;}' inputfile 
/usr/xpg4/bin/awk: line 0 (NR=1): output file "": No such file or directory


How can I get the above to succeed
Also - can I get the split to only split the file after say 20 matches of "CP START ACCOUNT" ?

Last edited by vgersh99; 01-18-2013 at 12:38 PM.. Reason: once again - please start using code tags!
# 4  
Code:
awk '/START/{close(x);x=("F" ++i)}{print > x;}' inputfile

if on Solaris, using nawk instead of awk
# 5  
Thanks, can you help on the previous comment as well ?


This works
Code:
$ /usr/xpg4/bin/awk '/CP START/{close(x);x="F"++i;}{print > x;}' inputfile

however this fails:
Code:
$ /usr/xpg4/bin/awk '/CP START ACCOUNT/{close(x);x="F"++i;}{print > x;}' inputfile 
/usr/xpg4/bin/awk: line 0 (NR=1): output file "": No such file or directory


How can I get the above to succeed
Also - can I get the split to only split the file after say 20 matches of "CP START ACCOUNT" ?

Last edited by vgersh99; 01-18-2013 at 01:00 PM.. Reason: third warning - start using code tags!
# 6  
Another approach is using a BASH script, but this is gonna run slower than awk:
Code:
#!/bin/bash

c=0
while read line
do
        if [[ "$line" =~ "^CP START ACCOUNT" ]]
        then
                c=$(( c + 1 ))
                echo "$line" >> F${c}.txt
        else
                echo "$line" >> F${c}.txt
        fi
done < inputfile

# 7  
Give this a try, extending vgersh99's proposal
Code:
awk     '/^CP START ACCOUNT/ {if (!(n%20)) {close (fn); fn=("F" ++i)}; n++}
         {print > fn;}
        ' file


Last edited by vgersh99; 01-18-2013 at 06:02 PM..
Login or Register to Reply

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
Split large file into smaller files without disturbing the entry chunks
Kamesh G
Dears, Need you help with the below file manipulation. I want to split the file into 8 smaller files but without cutting/disturbing the entries (meaning every small file should start with a entry and end with an empty line). It will be helpful if you can provide a one liner command for this...... UNIX for Beginners Questions & Answers
12
UNIX for Beginners Questions & Answers
Split large file to smaller fastly
mechvijays
hi , I have a requirement input file: 1 1111111111111 108 1 1111111111111 109 1 1111111111111 109 1 1111111111111 110 1 1111111111111 111 1 1111111111111 111 1 1111111111111 111 1 1111111111111 112 1 1111111111111 112 1 1111111111111 112 The output should be,... UNIX for Dummies Questions & Answers
19
UNIX for Dummies Questions & Answers
Split Large Files Based On Row Pattern..
aimy
Hi all. I've tried searching the web but could not find similar problem to mine. I have one large file to be splitted into several files based on the matching pattern found in each row. For example, let's say the file content: ...... Shell Programming and Scripting
13
Shell Programming and Scripting
Sed: Splitting A large File into smaller files based on recursive Regular Expression match
sumguy
I will simplify the explaination a bit, I need to parse through a 87m file - I have a single text file in the form of : <NAME>house........ SOMETEXT SOMETEXT SOMETEXT . . . . </script> MORETEXT MORETEXT . . .... Shell Programming and Scripting
6
Shell Programming and Scripting
Split large file into smaller file
sitaldip
hi Guys i need some help here.. i have a file which has > 800,000 lines in it. I need to split this file into smaller files with 25000 lines each. please help thanks... Shell Programming and Scripting
1
Shell Programming and Scripting

Featured Tech Videos