Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Split a huge 7 GB File Based on Pattern into 4 files Post 302836989 by KishM on Thursday 25th of July 2013 02:35:15 AM
Old 07-25-2013
Split a huge 7 GB File Based on Pattern into 4 files

Hi,

I have a Huge 7 GB file which has around 1 million records, i want to split this file into 4 files to contain around 250k messages each.

Please help me as Split command cannot work here as it might miss tags..

Format of the file is as below
Code:
<!--######[ABC] ###### START-->
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<!--######[ABC] ###### END-->
<!--######[ABC] ###### START-->
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<!--######[ABC] ###### END-->
<!--######[ABC] ###### START-->
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<XMLTag>DATA</XMLTag>
<!--######[ABC] ###### END-->


Moderator's Comments:
Mod Comment Use code tags please, see PM.

Last edited by zaxxon; 07-25-2013 at 04:55 AM.. Reason: code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Split a file based on a pattern

Dear all, I have a large file which is composed of 8000 frames, what i would like to do is split the file into 8000 single files names file.pdb.1, file.pdb.2 etc etc each frame in the large file is seperated by a "ENDMDL" flag so my thinking is to use this flag a a point to split the files... (4 Replies)
Discussion started by: Mish_99
4 Replies

2. Shell Programming and Scripting

Split a file into multiple files based on the input pattern

I have a file with lines something like. ...... 123_start ...... ....... 123_end .... ..... 456_start ...... ..... 456_end .... ..... 789_start .... .... 789_end (6 Replies)
Discussion started by: abinash
6 Replies

3. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

4. Shell Programming and Scripting

split XML file into multiple files based on pattern

Hello, I am using awk to split a file into multiple files using command: nawk '{ if ( $1 == "<process" ) { n=split($2, arr, "\""); file=arr } print > file }' processes.xml <process name="Process1.process"> ... (3 Replies)
Discussion started by: chiru_h
3 Replies

5. Shell Programming and Scripting

Split a file based on pattern and size

Hello, I have a large file (2GB) that I would like to split based on pattern and size. I've used the following command to split the file (token is "HELLO") awk '/HELLO/{i++}{print > "file"i}' input.txt and the output is similar to the following (i included filesize in KB): 10 ... (2 Replies)
Discussion started by: jl487
2 Replies

6. Shell Programming and Scripting

Split the file based on pattern

Hi , I have huge files around 400 mb, which has clob data and have diffeent scenarios: I am trying to pass scenario number as parameter and and get required modified file based on the scenario number and criteria. Scenario 1: file name : scenario_1.txt ... (2 Replies)
Discussion started by: sol_nov
2 Replies

7. Shell Programming and Scripting

Help needed - Split large file into smaller files based on pattern match

Help needed urgently please. I have a large file - a few hundred thousand lines. Sample CP START ACCOUNT 1234556 name 1 CP END ACCOUNT CP START ACCOUNT 2224444 name 1 CP END ACCOUNT CP START ACCOUNT 333344444 name 1 CP END ACCOUNT I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies

8. Shell Programming and Scripting

Split Large Files Based On Row Pattern..

Hi all. I've tried searching the web but could not find similar problem to mine. I have one large file to be splitted into several files based on the matching pattern found in each row. For example, let's say the file content: ... (13 Replies)
Discussion started by: aimy
13 Replies

9. Shell Programming and Scripting

How to split a file based on pattern line number?

Hi i have requirement like below M <form_name> sdasadasdMklkM D ...... D ..... M form_name> sdasadasdMklkM D ...... D ..... D ...... D ..... M form_name> sdasadasdMklkM D ...... M form_name> sdasadasdMklkM i want split file based on line number by finding... (10 Replies)
Discussion started by: bhaskar v
10 Replies

10. UNIX for Advanced & Expert Users

Split one file to many based on pattern

Hello All, I have records in a file in a pattern A,B,B,B,B,K,A,B,B,K Is there any command or simple logic I can pull out records into multiple files based on A record? I want output as File1: A,B,B,B,B,K File2: A,B,B,K (9 Replies)
Discussion started by: deal1dealer
9 Replies
Math::TamuAnova(3pm)					User Contributed Perl Documentation				      Math::TamuAnova(3pm)

NAME
Math::TamuAnova - Perl extension for the tamuanova library SYNOPSIS
use Math::TamuAnova; DESCRIPTION
This module allows you to use the tamu-anova library from perl programs. EXPORT None by default. Exportable constants anova_fixed anova_mixed anova_random Exportable functions anova anova_twoway printanova printanova_twoway USE
$hash=Math::TamuAnova::anova(DATA[], FACTOR[], J); DATA is an array of double, FACTOR an array of integer. Factors must be within 1..J DATA and FACTOR must have the same size. $hash2=Math::TamuAnova::anova_twoway(DATA[], FACTORA[], FACTORB[], JA, JB, mode); DATA is an array of double, FACTOR(A|B) arrays of integer. Factors A must be within 1..JA, and Factors B within 1..JB DATA, FACTORA and FACTORB must have the same size. EXAMPLES
$res=Math::TamuAnova::anova( [88.60,73.20,91.40,68.00,75.20,63.00,53.90, 69.20,50.10,71.50,44.90,59.50,40.20,56.30, 38.70,31.00,39.60,45.30,25.20,22.70], [1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4], 4); Math::TamuAnova::printtable( $res ); $res=Math::TamuAnova::anova_twoway( [6,10,11,13,15,14,22,12,15,19,18,31,18,9,12], [1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2], [1, 1, 1, 2, 2, 3, 3, 1, 1, 1, 1, 2, 3, 3, 3], 2,3, &Math::TamuAnova::anova_fixed); Math::TamuAnova::printtable_twoway( $res ); SEE ALSO
info tamu_anova AUTHOR
Vincent Danjean, <Vincent.Danjean@ens-lyon.org> COPYRIGHT AND LICENSE
Copyright (C) 2006 by Vincent Danjean This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available. perl v5.14.2 2012-05-25 Math::TamuAnova(3pm)
All times are GMT -4. The time now is 06:18 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy