04-26-2012
Some assumptions:
If the 100mb file contains 1 to 2 million records, and the control file is 10000 lines at 13 characters per line, using a simple sed routine will process about 130gb of data. Whether this is all disk i/o or memory will depend upon how well the shell uses memory.
If there is only one data field per line, create two temporary data files, one containing the tags, and the other the data. Add line numbers to the files.
Sort the data portion into data portion sequence. Sort the control file into original field sequence, write a merge program to read the sorted data only file, replace the field, and write a new temporary file with the new data (including the original line number).
Sort the new temporary data file back to line number sequence, and merge with the temporary tag file to produce a new xml file.
The total data processed this way should be less than 1gb.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I have just found out that a project I have developed in ASP (ultradev) needs to be migrated to a unix server and won't be able to support asp. Can anyone point me in the right direction to see what my options are with Unix? I have never worked with or developed for a Unix box before.
Also,... (3 Replies)
Discussion started by: Ricki
3 Replies
2. UNIX for Advanced & Expert Users
Hi All,
I want to sort a flat file which will contain millions of records based on a key/field. For this I want to use unix sort command and before that I want to make sure that unix sort command has any file size limitations. And also please let me know whether I have to change any... (1 Reply)
Discussion started by: chprvkmr
1 Replies
3. UNIX for Advanced & Expert Users
Hi,
I need to develop timeout functionality in my code, which is not affected by system's time changes.Unfortunately I dont have CLOCK_MONOTONIC support in my OS.
Are there any alternatives? (3 Replies)
Discussion started by: amitks21
3 Replies
4. Linux
Hi folks,
Any folk has experience on ntop/Nmon
ntop - network top
and its spinoff NMON
Welcome to nmon.net
Nmon
Nmon - Wikipedia, the free encyclopedia
nmon for AIX and Linux Performance Monitoring
IBM Wikis - AIX 5L Wiki - nmon
A free tool to analyze AIX and Linux... (5 Replies)
Discussion started by: satimis
5 Replies
5. Solaris
Hi All,
We have been trying to Install Metamail on our Solaris 10 server but have failed to do so.
We are a bit topo short of time here so are now trying to Explore any Meta Mail alternatives.
What we basically want to do is:
Server will recieve email, Procmail will recieve the email... (0 Replies)
Discussion started by: paragkhanore
0 Replies
6. Red Hat
I am looking for an alternative for sudo in linux, where i need not type the password.
OR is there any other version of 'growisofs', which can be executed under sudo??? As currently 'growisofs' refuses to start under sudo...
Thanks in advance (12 Replies)
Discussion started by: sony star
12 Replies
7. UNIX for Advanced & Expert Users
Hi,
I am using SSH to execute unix commands on remote machines. But, SSH will be diabled soon and I am looking for other alternatives to execute remote scripts/commands, without SSH . any suggestions or workarounds with out SSH for remote program executions ?
Thanks in advance. (4 Replies)
Discussion started by: talashil
4 Replies
8. Shell Programming and Scripting
Are there any other alternatives to using Expect script? Can functionality provided by Expect be achieved by any other scripting language? (7 Replies)
Discussion started by: indianya
7 Replies
9. UNIX for Dummies Questions & Answers
Hi is there an alternative other than the find command to get the size of files which are 10 days older ? I already use a script with find command
find . -mtime +10.
However would want to have an alternative script to find the size of files which are over 10 days.
Let me know if you... (1 Reply)
Discussion started by: venkidhadha
1 Replies
10. Shell Programming and Scripting
I'm adapting a BASH script to run with an absolute minimal amounts of Cygwin64 files so colleagues using Windows can use it without installing Cygwin.
I am down to the following in /bin only (replacing cut with parameter substitution eliminated all needed things in /etc)
bash.exe
cygattr-1.dll... (5 Replies)
Discussion started by: Michael Stora
5 Replies
LEARN ABOUT DEBIAN
sort::key::multi
Sort::Key::Multi(3pm) User Contributed Perl Documentation Sort::Key::Multi(3pm)
NAME
Sort::Key::Multi - simple multikey sorts
SYNOPSIS
use Sort::Key::Multi qw(sikeysort);
my @data = qw(foo0 foo1 bar34 bar0 bar34 bar33 doz4)
my @sisorted = sikeysort { /(w+)(d+)/} @data;
DESCRIPTION
Sort::Key::Multi creates multikey sorting subroutines and exports them to the caller package.
The names of the sorters are of the form "xxxkeysort" or "xxxkeysort_inplace", where "xxx" determines the number and types of the keys as
follows:
+ "i" indicates an integer key, "u" indicates an unsigned integer key, "n" indicates a numeric key, "s" indicates a string key and "l"
indicates a string key that obeys locale order configuration.
+ Type characters can be prefixed by "r" to indicate reverse order.
+ A number following a type character indicates that the key type has to be repeated as many times (for instance "i3" is equivalent to
"iii" and "rs2" is equivalent to "rsrs").
+ Underscores ("_") can be freely used between type indicators.
For instace:
use Key::Sort::Multi qw(iirskeysort
i2rskeysort
i_i_rs__keysort
i2rs_keysort);
exports to the caller package fourth identical sorting functions that take two integer keys that are sorted in ascending order and one
string key that is sorted in descending order.
The generated sorters take as first argument a subroutine that is used to extract the keys from the values which are passed inside $_, for
example:
my @data = qw(1.3.foo 1.3.bar 2.3.bar 1.4.bar 1.7.foo);
my @s = i2rs_keysort { split /./, $_ } @data;
SEE ALSO
For a more general multikey sorter generator see Sort::Key::Maker.
COPYRIGHT AND LICENSE
Copyright (C) 2006 by Salvador Fandin~o <sfandino@yahoo.com>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or,
at your option, any later version of Perl 5 you may have available.
perl v5.14.2 2010-04-16 Sort::Key::Multi(3pm)