I repeat its only a sample code and needs to be tweaked a lot before actually using it but it doesn't mean that it won't work, it will work and as we are not aiming only at a just working code, always a better one
I have designed as a master-slave code.
Master code
Master will split the big file into chunks and the slave will process that. Finally master will delete the part files, other intermediate files and merge the final output.
Here is the master code
Code:
#! /opt/third-party/bin/perl
use strict;
#Either number of instances or number_of_line can be used for configuration
#For example am using number_of_lines as configuration
use constant NUM_OF_LINES => 1000000;
use constant SLAVE_NAME => 'slave.pl';
use constant END_MARKER => '_END_PROCESSED_';
use constant SLAVE_FILE_PART_NAME => 'part';
use constant FINAL_OUTPUT_FILE => 'final.output';
my $line_counter = 0;
my $split_file_counter = 0;
my %splitFileHash;
my $file_name = $split_file_counter;
my $file_handle = undef;
my $command = "./" . +SLAVE_NAME;
die "[MASTER] Please provide filename as input\n" if ( ! defined $ARGV[0] );
sub mergeOutput {
open(FOFILE, ">", +FINAL_OUTPUT_FILE)
or die "[MASTER] Unable to open final output file : +FINAL_OUTPUT_FILE <$!>\n";
foreach my $file ( keys %splitFileHash ) {
my $modified_file = ($file . "." . +SLAVE_FILE_PART_NAME);
open(PFILE, "<", $modified_file) or die "[MASTER] Unable to open part file : $modified_file <$!>\n";
while(chomp ( my $data = <PFILE>) ) {
next if ( $data eq +END_MARKER );
print FOFILE "$data\n";
}
close(PFILE);
unlink($modified_file) or die "[MASTER] Unable to delete part file : $modified_file <$!>\n";
unlink($file) or die "[MASTER] Unable to delete split file : $file <$!>\n";
}
close(FOFILE);
}
sub checkFileHashStatus {
foreach my $file ( keys %splitFileHash ) {
return 0 if ( $splitFileHash{$file} eq "N" );
}
return 1; #This means all the files have been processed
}
sub checkForJobsCompletion {
foreach my $file ( keys %splitFileHash ) {
next if ( $splitFileHash{$file} eq "Y" );
my $data = undef;
my $modified_file = ($file . "." . +SLAVE_FILE_PART_NAME);
open(LFILE, "<", $modified_file)
or warn "[MASTER] Unable to open file : $modified_file for checking <$!>\n";
while(chomp($data = <LFILE>)) {
if( $data eq +END_MARKER ) {
#File processing is completed, mark it
$splitFileHash{$file} = "Y";
print "[MASTER] File:$file processing completed\n";
last;
}
}
close(FILE);
}
}
sub closeLastFile {
close($file_handle);
my $local_command = $command . " " . $split_file_counter . " " . $split_file_counter . " &";
print "[MASTER] Spawning instance $split_file_counter : $local_command\n";
system("$local_command");
}
sub getNewFile {
close($file_handle) if defined ( $file_handle );
if ( $split_file_counter != 0 ) {
my $local_command = $command . " " . $split_file_counter . " " . $split_file_counter . " &";
print "[MASTER] Spawning instance $split_file_counter : $local_command\n";
system("$local_command");
}
$split_file_counter++;
my $file_name = $split_file_counter;
$splitFileHash{$file_name} = "N";
open($file_handle, ">", $file_name) or die "[MASTER] Unable to open file for writing : <$!>\n";
}
open(FILE, "<", $ARGV[0]) or die "[MASTER] Unable to open file : $ARGV[0]\n";
while(<FILE>) {
getNewFile if( ( ! defined $file_handle && $line_counter == 0 ) || $line_counter % +NUM_OF_LINES == 0 );
print $file_handle "$_";
$line_counter++;
}
close(FILE);
closeLastFile;
my $iteration_counter = 1;
while ( 1 ) {
print "[MASTER] FileCheck Iteration Counter:$iteration_counter\n";
checkForJobsCompletion;
last if ( checkFileHashStatus == 1 );
$iteration_counter++;
}
print "[MASTER] Merging output\n";
mergeOutput;
exit (0);
Slave code
For demonstration purpose, I have used a simple logic to split data of the form abcd;efgh
and form an output like abcd-efgh-efgh-abcd
Only the logic needs to be changed in the slave code and the master code is generic. It will work for all the cases and can be used for computations involving huge data where sequence is not important
Here is the slave code
Code:
#! /opt/third-party/bin/perl
use strict;
my $outputfilename = $ARGV[1] . ".part";
open(OFILE, ">", $outputfilename) or die "[SLAVE-$ARGV[1]] Unable to open file : $ARGV[1]\n";
open(FILE, "<", $ARGV[0]) or die "[SLAVE-$ARGV[1]] Unable to open file : $ARGV[0]\n";
while(<FILE>) {
chomp;
my($first, $second) = split(';');
print OFILE "$first-$second#$second-$first\n";
}
close(FILE);
print OFILE "_END_PROCESSED_\n";
close(OFILE);
exit (0);
Hello Everybody,
Could anyone please tell me how to get ssh to work without asking for passwords? (i want to do a ssh <hostname> without getting a request for a password but getting connected straight away)
I have attempted the following but to no avail :( ...
I tried to generate a SSH... (5 Replies)
So basically what im trying to do is ...
Open file, read that file, than try to find ..
We or we and replace them with I, but not replace the cases where words contain We or we, such as Went, went, etc
a and replace them with the, but not replace the cases where words contain a, such as... (1 Reply)
One of our servers runs Solaris 8 and does not have "ls -lh" as a valid command. I wrote the following script to make the ls output easier to read and emulate "ls -lh" functionality. The script works, but it is slow when executed on a directory that contains a large number of files. Can anyone make... (10 Replies)
Hey all my co-workers and I are trying to put together a list of things root "Can't" do on any *NIX OS, so I wanted to come here and see what all we could come up with.
Here are two to start this off:
write to a read only mount FS
kill a tape rewind
Please add what you know.
Thanks,... (5 Replies)
Can someone help me edit the below script to make it run faster?
Shell: bash
OS: Linux Red Hat
The point of the script is to grab entire chunks of information that concerns the service "MEMORY_CHECK".
For each chunk, the beginning starts with "service {", and ends with "}".
I should... (15 Replies)
Hi,
I have a large number of input files with two columns of numbers.
For example:
83 1453
99 3255
99 8482
99 7372
83 175
I only wish to retain lines where the numbers fullfil two requirements. E.g:
=83
1000<=<=2000
To do this I use the following... (10 Replies)