I have this requirement to process large files (200MB+).Format of files is like:
Final output required is
My existing AWK script does the job but its very slow.
val1, val2,val3 are repeating values and i have "recordstart" before each set of values
I would like to process these files using perl. Idea is to
1. split file into hashes by using "recordstart" as delimiter.
2. process each hash separately.
3. values in 2nd column also need some regular expression job.
I know how to read a normal file into hash. My problem here is to split file into chunks and then handle those hash by common variable.
Issue is with values like
4E598102'H
255'D
I need to remove 'H and 'D from these values. Sometimes change from HEX to decimal.
processing these value takes much time.
I am under impression that perl will make it bit fast. Also, Using hashes, I can choose fields easily. I dont need to print everything
When strictly processing flat files, very often awk is faster than perl, depending on many factors.
One might suggest that you try a different version of awk on the platform you are using.
With an example input file:
Your adjusted awk script with some adjustments:
The output would look like this:
If you are running on a real POSIX compliant platform, you will have multiple versions of awk. The "nawk" tool has more functionality in regard to some built-in variables than "awk" and "/usr/xpg4/bin/awk" has the ability to hold open more simultaneous file handles.
I took the sample data set provided in the thread, repeated it until the input file was 364 lines long, and processed the file 260 times on the "*awk" command line. The aggregate size of all data sets was 1.7MB.
On a slow box, I get the following performance results:
You can get a 2x-3x boost in speed, by merely adjusting which "*awk" you use.
Removing the 'H and 'D is a simple step in awk. Let's adjust your sample datafile, and repeat it the same number of times in the previous timing example:
Now, we will adjust the nawk script to remove them, and time the results against over 200 files where each file has thousands of lines:
The results were just slightly slower adding the stripping.
You can shave off 0.10s more regularly, by avoiding the compares and just force the "gusb" for the 'H and 'D just before the "printf", all time.
In total, "nawk" is faster than "/usr/xpg4/bin/awk", which is faster than "awk".
Global substitutions can be easily managed in awk.
Last edited by DavidHalko; 02-03-2011 at 06:27 PM..
Reason: remved some unneeded white space
Hello,
I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this :
This is the output of ls command : I stored the output in a file filelist
1.1M... (5 Replies)
Hi,
I have an hashes of hash, where hash is dynamic, it can be n number of hash. i need to compare data_count values of all .
my %result (
$abc => {
'data_count' => '10',
'ID' => 'ABC122',
}
$def => {
'data_count' => '20',
'ID' => 'defASe',
... (1 Reply)
Hi to everybody.
I have a script in AWK with a revursive function, when I make the recursive call I'm loosing values in the local variables. So I'm trying to do the script in Perl, but I don't know Perl.
I want to load several hashes with the data in a pipe separated file. In AWK it looks... (0 Replies)
Hi,
In Perl, is it possible to use a range of numbers with '..' as a key in a hash?
Something in like:
%hash = (
'768..1536' => '1G',
'1537..2560' => '2G'
);
That is, the range operation is evaluated, and all members of the range are... (3 Replies)
Can Someone explain me why even using Tie::IxHash I can not get the output data in the same order that it was inserted? See code below.
#!/usr/bin/perl
use warnings;
use Tie::IxHash;
use strict;
tie (my %programs, "Tie::IxHash");
while (my $line = <DATA>) {
chomp $line;
my(... (1 Reply)
I am trying to read in a 2 column data file into Perl Hash array index. Here is my code.
#!/usr/bin/perl -w
use strict;
use warnings;
my $file = "file_a";
my @line = ();
my $index = 0;
my %ind_file = ();
open(FILE, $file) or die($!);
while(<FILE>) {
chomp($_);
if ($_ eq '')
{
... (1 Reply)
Qspace ABC
Queue doCol: true
Queue order: fifo
Queue setCol: red
Queue order: fifo
Qspace XYZ
Queue getCol: true
Queue order: fifo
I need to append every line in this file with Qspace & Queue, so that final o/p shall look like this,
Qspace: ABC Queue: doCol
Qspace: ABC Queue: doCol... (2 Replies)
I need a script to process a huge single line text file:
The sample of the text is:
"forward_inline_item": "Inline", "options_region_Australia": "Australia", "server_event_err_msg": "There was an error attempting to save", "Token": "Yes", "family": "Family","pwd_login_tab": "Enter Your... (1 Reply)
Hey everyone ...
I wanted to process the contents of a file, as in modify its contents. whats the best way to do it on perl? In more detail I hav to go through the contents of the file, match patterns n then modify the contents of the same file depending on the matching results. Any help is... (2 Replies)