Sed: Splitting A large File into smaller files based on recursive Regular Expression match
I will simplify the explaination a bit, I need to parse through a 87m file -
I have a single text file in the form of :
I want to extract <NAME>, </script>, and all lines between the two and place them into respectives files
ending up with
file1.txt
file2.txt
file3.txt
I have searched sed one liners, used the search feature here, looked in my Oreilly sed/awk pocket guide but nothing really provides a solution.
Thanks in advance. SORRY FOR THE REEDIT !!!
Last edited by Scrutinizer; 03-29-2013 at 06:07 PM..
Reason: code tags
hi all
im new to this forum..excuse me if anythng wrong.
I have a file containing 600 MB data in that. when i do parse the data in perl program im getting out of memory error.
so iam planning to split the file into smaller files and process one by one.
can any one tell me what is the code... (1 Reply)
Hi Everyone,
I am using a centos 5.2 server as an sflow log collector on my network. Currently I am using inmons free sflowtool to collect the packets sent by my switches. I have a bash script running on an infinate loop to stop and start the log collection at set intervals - currently one... (2 Replies)
I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
I need to write a shell script for below scenario
My input file has data in format:
qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43
qwerty0101CFG 12345... (19 Replies)
Dear all,
I have a specific problem that I don't quite understand how to solve. I have two files, both of the same format:
XXXXXX_FIND1 bla bla bla
bla
bla
bla
bla
bla
bla
bla
bla
bla
========
(return)
XXXXXX_FIND2 bla bla bla
bla
bla
bla (10 Replies)
Hi Experts,
I have to split huge file based on the pattern to create smaller files. The pattern which is expected in the file is:
Master.....
First...
second....
second...
third..
third...
Master...
First..
second...
third...
Master...
First...
second..
second..
second..... (2 Replies)
Hi,
I'm trying to split a large file into several smaller files
the script will have two input arguments argument1=filename and argument2=no of files to be split.
In my large input file I have a header followed by 100009 records
The first line is a header; I want this header in all my... (9 Replies)
I am trying to update an older program on a small cluster. It uses individual files to send jobs to each node. However the newer database comes as one large file, containing over 10,000 records. I therefore need to split this file. It looks like this:
HMMER3/b
NAME 1-cysPrx_C
ACC ... (2 Replies)
Hi Everybody!
I need some help with a regular expression in Perl that will match files named messages, but also files named message.1, message.2 and so on. So really I need one that will find messages and messages that might be followed by a period and a digit without matching other files like... (2 Replies)
Help needed urgently please.
I have a large file - a few hundred thousand lines.
Sample
CP START ACCOUNT
1234556
name 1
CP END ACCOUNT
CP START ACCOUNT
2224444
name 1
CP END ACCOUNT
CP START ACCOUNT
333344444
name 1
CP END ACCOUNT
I need to split this file each time "CP START... (7 Replies)
Discussion started by: frustrated1
7 Replies
LEARN ABOUT MOJAVE
locale::codes::langext5.18
Locale::Codes::LangExt(3pm) Perl Programmers Reference Guide Locale::Codes::LangExt(3pm)NAME
Locale::Codes::LangExt - standard codes for language extension identification
SYNOPSIS
use Locale::Codes::LangExt;
$lext = code2langext('acm'); # $lext gets 'Mesopotamian Arabic'
$code = langext2code('Mesopotamian Arabic'); # $code gets 'acm'
@codes = all_langext_codes();
@names = all_langext_names();
DESCRIPTION
The "Locale::Codes::LangExt" module provides access to standard codes used for identifying language extensions, such as those as defined in
the IANA language registry.
Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default IANA language
registry codes will be used.
SUPPORTED CODE SETS
There are several different code sets you can use for identifying language extensions. A code set may be specified using either a name, or
a constant that is automatically exported by this module.
For example, the two are equivalent:
$lext = code2langext('acm','alpha');
$lext = code2langext('acm',LOCALE_LANGEXT_ALPHA);
The codesets currently supported are:
alpha
This is the set of three-letter (lowercase) codes from the IANA language registry, such as 'acm' for Mesopotamian Arabic.
This is the default code set.
ROUTINES
code2langext ( CODE [,CODESET] )
langext2code ( NAME [,CODESET] )
langext_code2code ( CODE ,CODESET ,CODESET2 )
all_langext_codes ( [CODESET] )
all_langext_names ( [CODESET] )
Locale::Codes::LangExt::rename_langext ( CODE ,NEW_NAME [,CODESET] )
Locale::Codes::LangExt::add_langext ( CODE ,NAME [,CODESET] )
Locale::Codes::LangExt::delete_langext ( CODE [,CODESET] )
Locale::Codes::LangExt::add_langext_alias ( NAME ,NEW_NAME )
Locale::Codes::LangExt::delete_langext_alias ( NAME )
Locale::Codes::LangExt::rename_langext_code ( CODE ,NEW_CODE [,CODESET] )
Locale::Codes::LangExt::add_langext_code_alias ( CODE ,NEW_CODE [,CODESET] )
Locale::Codes::LangExt::delete_langext_code_alias ( CODE [,CODESET] )
These routines are all documented in the Locale::Codes::API man page.
SEE ALSO
Locale::Codes
The Locale-Codes distribution.
Locale::Codes::API
The list of functions supported by this module.
http://www.iana.org/assignments/language-subtag-registry
The IANA language subtag registry.
AUTHOR
See Locale::Codes for full author history.
Currently maintained by Sullivan Beck (sbeck@cpan.org).
COPYRIGHT
Copyright (c) 2011-2013 Sullivan Beck
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.18.2 2013-11-04 Locale::Codes::LangExt(3pm)