Hi there, I'm camor and I'm trying to process huge files with bash scripting and awk.
I've got a dataset folder with 10 files (16 millions of row each one - 600MB), and I've got a sorted file with all keys inside.
For example:
I would like to obtain an output like this:
I create a bash script like this, It produces a sorted file with all of my keys, it's very huge, about 62 millions of record, I slice this file into pieces and I pass each piece to my awk script.
Here is my AWK script:
I've figured out that my bottleneck came from iterating on dataset folder by awk input (10 files with 16.000.000 lines each). Everything is working on a small set of data, but with real data, RAM (30GB) congested. I thin the problem is " dataset/*" as AWK input.
Does anyone have any suggestions or advices? Thank you.
Can you please help me with writing script for following purpose.
I have to divide single large web access log file into multiple log files based on dates inside the log file.
For example:
if data is logged in the access file for jan-10-08 , jan-11-08 , Jan-12-08
then make small log file... (1 Reply)
Hi,
I need some help creating a tidy shell program with awk or other language that will split large length files efficiently.
Here is an example dump:
<A001_MAIL.DAT>
0001 Ronald McDonald 01 H81
0002 Elmo St. Elmo 02 H82
0003 Cookie Monster 01 H81
0004 Oscar ... (16 Replies)
I have a file with a simple list of ids. 750,000 rows. I have to break it down into multiple 50,000 row files to submit in a batch process.. Is there an easy script I could write to accomplish this task? (2 Replies)
I have a 500 MB XML file from a FileMaker database export, it's formatted horribly (no line breaks at all). The node structure is basically
<FMPXMLRESULT>
<METADATA>
<FIELD att="............." id="..."/>
</METADATA>
<RESULTSET FOUND="1763457">
<ROW att="....." etc="....">
... (16 Replies)
Hi,
I'd like to process multiple files. For example:
file1.txt
file2.txt
file3.txt
Each file contains several lines of data. I want to extract a piece of data and output it to a new file.
file1.txt ----> newfile1.txt
file2.txt ----> newfile2.txt
file3.txt ----> newfile3.txt
Here is... (3 Replies)
Hello gurus,
I am new to "awk" and trying to break a large file having 4 million records into several output files each having half million but at the same time I want to keep the similar key records in the same output file, not to exist accross the files.
e.g. my data is like:
Row_Num,... (6 Replies)
Hello,
Error
awk: Internal software error in the tostring function on TS1101?05044400?.0085498227?0?.0011041461?.0034752266?.00397045?0?0?0?0?0?0?11/02/10?09/23/10???10?no??0??no?sct_det3_10_20110516_143936.txt
What it is
It is a unix shell script that contains an awk program as well as... (4 Replies)
I need to write a shell script for below scenario
My input file has data in format:
qwerty0101TWE 12345 01022005 01022005 datainala alanfernanded 26
qwerty0101mXZ 12349 01022005 06022008 datainalb johngalilo 28
qwerty0101TWE 12342 01022005 07022009 datainalc hitalbert 43
qwerty0101CFG 12345... (19 Replies)
I have a large zone file dump that consists of
; DNS record for the adomain.com domain
data1
data2
data3
data4
data5
CRLF
CRLF
CRLF
; DNS record for the anotherdomain.com domain
data1
data2
data3
data4
data5
data6
CRLF (7 Replies)
I've got two files that each contain a 16-digit number in positions 1-16. The first file has 63,120 entries all sorted numerically. The second file has 142,479 entries, also sorted numerically.
I want to read through each file and output the entries that appear in both. So far I've had no... (13 Replies)
Discussion started by: Scottie1954
13 Replies
LEARN ABOUT CENTOS
locale::codes::script
Locale::Codes::Script(3) User Contributed Perl Documentation Locale::Codes::Script(3)NAME
Locale::Codes::Script - standard codes for script identification
SYNOPSIS
use Locale::Codes::Script;
$script = code2script('phnx'); # 'Phoenician'
$code = script2code('Phoenician'); # 'Phnx'
$code = script2code('Phoenician',
LOCALE_CODE_NUMERIC); # 115
@codes = all_script_codes();
@scripts = all_script_names();
DESCRIPTION
The "Locale::Codes::Script" module provides access to standards codes used for identifying scripts, such as those defined in ISO 15924.
Most of the routines take an optional additional argument which specifies the code set to use. If not specified, the default ISO 15924
four-letter codes will be used.
SUPPORTED CODE SETS
There are several different code sets you can use for identifying scripts. A code set may be specified using either a name, or a constant
that is automatically exported by this module.
For example, the two are equivalent:
$script = code2script('phnx','alpha');
$script = code2script('phnx',LOCALE_SCRIPT_ALPHA);
The codesets currently supported are:
alpha, LOCALE_SCRIPT_ALPHA
This is a set of four-letter (capitalized) codes from ISO 15924 such as 'Phnx' for Phoenician. It also includes additions to this set
included in the IANA language registry.
The Zxxx, Zyyy, and Zzzz codes are not used.
This is the default code set.
num, LOCALE_SCRIPT_NUMERIC
This is a set of three-digit numeric codes from ISO 15924 such as 115 for Phoenician.
ROUTINES
code2script ( CODE [,CODESET] )
script2code ( NAME [,CODESET] )
script_code2code ( CODE ,CODESET ,CODESET2 )
all_script_codes ( [CODESET] )
all_script_names ( [CODESET] )
Locale::Codes::Script::rename_script ( CODE ,NEW_NAME [,CODESET] )
Locale::Codes::Script::add_script ( CODE ,NAME [,CODESET] )
Locale::Codes::Script::delete_script ( CODE [,CODESET] )
Locale::Codes::Script::add_script_alias ( NAME ,NEW_NAME )
Locale::Codes::Script::delete_script_alias ( NAME )
Locale::Codes::Script::rename_script_code ( CODE ,NEW_CODE [,CODESET] )
Locale::Codes::Script::add_script_code_alias ( CODE ,NEW_CODE [,CODESET] )
Locale::Codes::Script::delete_script_code_alias ( CODE [,CODESET] )
These routines are all documented in the Locale::Codes::API man page.
SEE ALSO
Locale::Codes
The Locale-Codes distribution.
Locale::Codes::API
The list of functions supported by this module.
http://www.unicode.org/iso15924/
Home page for ISO 15924.
http://www.iana.org/assignments/language-subtag-registry
The IANA language subtag registry.
AUTHOR
See Locale::Codes for full author history.
Currently maintained by Sullivan Beck (sbeck@cpan.org).
COPYRIGHT
Copyright (c) 1997-2001 Canon Research Centre Europe (CRE).
Copyright (c) 2001-2010 Neil Bowers
Copyright (c) 2010-2013 Sullivan Beck
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
perl v5.16.3 2013-02-27 Locale::Codes::Script(3)