Sed: Splitting A large File into smaller files based on recursive Regular Expression match


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sed: Splitting A large File into smaller files based on recursive Regular Expression match
# 1  
Old 03-29-2013
Sed: Splitting A large File into smaller files based on recursive Regular Expression match

I will simplify the explaination a bit, I need to parse through a 87m file -


I have a single text file in the form of :


Code:
<NAME>house........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
.
.

<NAME>car........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
.
.
<NAME>boat........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>
MORETEXT
MORETEXT
.
.
.
.

I want to extract <NAME>, </script>, and all lines between the two and place them into respectives files

ending up with


file1.txt
Code:
<NAME>house........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>

file2.txt
Code:
<NAME>car........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>

file3.txt
Code:
<NAME>boat........
SOMETEXT
SOMETEXT 
SOMETEXT
.
.
.
.
</script>

I have searched sed one liners, used the search feature here, looked in my Oreilly sed/awk pocket guide but nothing really provides a solution.

Thanks in advance. SORRY FOR THE REEDIT !!!

Last edited by Scrutinizer; 03-29-2013 at 06:07 PM.. Reason: code tags
# 2  
Old 03-29-2013
Your written specification does not fit your sample output file, as you want the tokens "and all lines between the two", but you have MORETEXT etc. in your File1.txt etc. which is outside the two.
That sample one would be easy, if not in sed, then well in awk:
Code:
$ awk '/<NAME>/ {FN="File"++i".txt"}; {print >FN}' file

# 3  
Old 03-29-2013
You are absolutely correct. Dang cut and paste makes ya lazy .. lemme edit and fix.
# 4  
Old 03-29-2013
Reedited - fine! Do you think you find the right answer with the starting point given above?
This User Gave Thanks to RudiC For This Post:
# 5  
Old 03-29-2013
Quote:
Originally Posted by RudiC
Your written specification does not fit your sample output file, as you want the tokens "and all lines between the two", but you have MORETEXT etc. in your File1.txt etc. which is outside the two.
That sample one would be easy, if not in sed, then well in awk:
Code:
$ awk '/<NAME>/ {FN="File"++i".txt"}; {print >FN}' file

Depending on how many <NAME> lines there are in the input file, you might have to close the output files when you're done writing to them:
Code:
{if(FN)close(FN);FN="File"++i".txt"}

This User Gave Thanks to Don Cragun For This Post:
# 6  
Old 04-02-2013
Hey RudiC .... I havent tried this yet. They just reimaged my Laptop with win 7 and my access to everything is hosed. Been working on that ... as soon as Im back up, Ill give this a try ...

Meanwhile Thanks !!
# 7  
Old 04-02-2013
A modified version
Code:
awk '/<NAME>/{if(FN)close(FN);FN="File"++i".txt";p=1}p{print >FN}/script/{p=0}' file

--ahamed
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help needed - Split large file into smaller files based on pattern match

Help needed urgently please. I have a large file - a few hundred thousand lines. Sample CP START ACCOUNT 1234556 name 1 CP END ACCOUNT CP START ACCOUNT 2224444 name 1 CP END ACCOUNT CP START ACCOUNT 333344444 name 1 CP END ACCOUNT I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies

2. Homework & Coursework Questions

Regular Expression to match files in Perl

Hi Everybody! I need some help with a regular expression in Perl that will match files named messages, but also files named message.1, message.2 and so on. So really I need one that will find messages and messages that might be followed by a period and a digit without matching other files like... (2 Replies)
Discussion started by: Hax0rc1ph3r
2 Replies

3. Shell Programming and Scripting

Splitting large file and renaming based on field

I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this: HMMER3/b NAME 1-cysPrx_C ACC ... (2 Replies)
Discussion started by: fozrun
2 Replies

4. Shell Programming and Scripting

Splitting a file into several smaller files using perl

Hi, I'm trying to split a large file into several smaller files the script will have two input arguments argument1=filename and argument2=no of files to be split. In my large input file I have a header followed by 100009 records The first line is a header; I want this header in all my... (9 Replies)
Discussion started by: ramky79
9 Replies

5. Shell Programming and Scripting

Problem with splitting large file based on pattern

Hi Experts, I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is: Master..... First... second.... second... third.. third... Master... First.. second... third... Master... First... second.. second.. second..... (2 Replies)
Discussion started by: saisanthi
2 Replies

6. Shell Programming and Scripting

search a regular expression and match in two (or more files) using bash

Dear all, I have a specific problem that I don't quite understand how to solve. I have two files, both of the same format: XXXXXX_FIND1 bla bla bla bla bla bla bla bla bla bla bla bla ======== (return) XXXXXX_FIND2 bla bla bla bla bla bla (10 Replies)
Discussion started by: TheTransporter
10 Replies

7. Shell Programming and Scripting

Splitting large file into multiple files in unix based on pattern

I need to write a shell script for below scenario My input file has data in format: qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26 qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28 qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43 qwerty0101CFG 12345... (19 Replies)
Discussion started by: jimmy12
19 Replies

8. UNIX for Dummies Questions & Answers

multiple smaller files from one large file

I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
Discussion started by: rtroscianecki
2 Replies

9. Shell Programming and Scripting

Help with splitting a large text file into smaller ones

Hi Everyone, I am using a centos 5.2 server as an sflow log collector on my network. Currently I am using inmons free sflowtool to collect the packets sent by my switches. I have a bash script running on an infinate loop to stop and start the log collection at set intervals - currently one... (2 Replies)
Discussion started by: lord_butler
2 Replies

10. UNIX for Dummies Questions & Answers

splitting the large file into smaller files

hi all im new to this forum..excuse me if anythng wrong. I have a file containing 600 MB data in that. when i do parse the data in perl program im getting out of memory error. so iam planning to split the file into smaller files and process one by one. can any one tell me what is the code... (1 Reply)
Discussion started by: vsnreddy
1 Replies
Login or Register to Ask a Question