Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Split a huge 7 GB File Based on Pattern into 4 files Post 302836989 by KishM on Thursday 25th of July 2013 02:35:15 AM
Old 07-25-2013
Split a huge 7 GB File Based on Pattern into 4 files

Hi,

I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each.

Please help me as Split command cannot work here as it might miss tags..

Format of the file is as below
Code:
<!--######[ABC] ###### START-->
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<!--######[ABC] ###### END-->
<!--######[ABC] ###### START-->
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<!--######[ABC] ###### END-->
<!--######[ABC] ###### START-->
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<!--######[ABC] ###### END-->


Moderator's Comments:
Mod Comment Use code tags please, see PM.

Last edited by zaxxon; 07-25-2013 at 04:55 AM.. Reason: code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies

2. Shell Programming and Scripting

Split a file into multiple files based on the input pattern

I have a file with lines something like. ...... 123_start ...... ....... 123_end .... ..... 456_start ...... ..... 456_end .... ..... 789_start .... .... 789_end (6 Replies)
Discussion started by: abinash
6 Replies

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

4. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

5. Shell Programming and Scripting

Split a file based on pattern and size

Hello, I have a large file (2GB) that I would like to split based on pattern and size. I've used the following command to split the file (token is "HELLO") awk '/HELLO/{i++}{print > "file"i}' input.txt and the output is similar to the following (i included filesize in KB): 10 ... (2 Replies)
Discussion started by: jl487
2 Replies

6. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

7. Shell Programming and Scripting

Help needed - Split large file into smaller files based on pattern match

Help needed urgently please. I have a large file - a few hundred thousand lines. Sample CP START ACCOUNT 1234556 name 1 CP END ACCOUNT CP START ACCOUNT 2224444 name 1 CP END ACCOUNT CP START ACCOUNT 333344444 name 1 CP END ACCOUNT I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies

8. Shell Programming and Scripting

Split Large Files Based On Row Pattern..

Hi all. I've tried searching the web but could not find similar problem to mine. I have one large file to be splitted into several files based on the matching pattern found in each row. For example, let's say the file content: ... (13 Replies)
Discussion started by: aimy
13 Replies

9. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies

10. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies
URI::Split(3)						User Contributed Perl Documentation					     URI::Split(3)

NAME
URI::Split - Parse and compose URI strings SYNOPSIS
use URI::Split qw(uri_split uri_join); ($scheme, $auth, $path, $query, $frag) = uri_split($uri); $uri = uri_join($scheme, $auth, $path, $query, $frag); DESCRIPTION
Provides functions to parse and compose URI strings. The following functions are provided: ($scheme, $auth, $path, $query, $frag) = uri_split($uri) Breaks up a URI string into its component parts. An "undef" value is returned for those parts that are not present. The $path part is always present (but can be the empty string) and is thus never returned as "undef". No sensible value is returned if this function is called in a scalar context. $uri = uri_join($scheme, $auth, $path, $query, $frag) Puts together a URI string from its parts. Missing parts are signaled by passing "undef" for the corresponding argument. Minimal escaping is applied to parts that contain reserved chars that would confuse a parser. For instance, any occurrence of '?' or '#' in $path is always escaped, as it would otherwise be parsed back as a query or fragment. SEE ALSO
URI, URI::Escape COPYRIGHT
Copyright 2003, Gisle Aas This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.16.2 2012-02-11 URI::Split(3)
All times are GMT -4. The time now is 11:56 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy