Perl sum really inefficient!!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl sum really inefficient!!
# 1  
Old 05-05-2009
Perl sum really inefficient!!

Hi all,

I have a file like the following:


ID,
2,Andrew,0,1,2,3,4,2,5,6,7,7,9,3,4,5,34,3,2,1,5,6,78,89,8,7,6......................
4,James,0,6,7,0,5,6,4,7,8,9,6,46,6,3,2,5,6,87,0,341,0,5,2,5,6....................
END,

(there are more entires on each line but to keep it simple I've left them off).




What I want to do is to sum every other value after the name on each line e.g. for Andrew I want to sum 0,2,4,5 etc then I want to sum the others e.g 1,3,2,6 etc and then print out the ID value, the name and the two totals.

e.g. 2,Andrew,164,133

I currently have the following:

$input_file3="$results_path/count_file.csv";
open(DAT3, $input_file3) || print "Could not open count file!";
@raw_data3=<DAT3>;
close(DAT3);

foreach $line (@raw_data3)
{
chop($line);
($VAR,$Name,$S1,$F1,$S2,$F2,$S3,$F3,$S4,$F4,$S5,$F5,$S6,$F6,$S7,$F7,$S8,$F8,$S9,$F9,$S10,$F10,$S11,$ F11,$S12,$F12,$S13,$F13,$S14,
$F14,$S15,$F15,$S16,$F16,$S17,$F17,$S18,$F18,$S19,$F19,$S20,$F20,$S21,$F21,$S22,$F22,$S23,$F23,$S24, $F24)=split(/,/,$line);

if ($VAR eq "ID" || $VAR eq "END")
{
`echo "ignoring this line"`
}
else
{
$suc = $S1 + $S2 + $S3 + $S4 + $S5 + $S6 + $S7 + $S8 + $S9 + $S10 +$S11 + $S12 + $S13 + $S14 + $S15 + $S16 + $S17 + $S18 + $S19
+ $S20 + $S21 + $S22 + $S23 + $S24;

$fail = $F1 + $F2 + $F3 + $F4 + $F5 +$F6 + $F7 + $F8 + $F9 + $F10 + $F11 + $F12 + $F13 + $F14 + $F15 + $F16 + $F17 + $F18 +$F19 +
$F20 + $F21 + $F22 +$F23 +$F24;

`echo "$CC,$Name,$suc,$fail" >> $tmp_path/suc_and_fail`;
}
}


The above works but it consumes a huge amount of memory and about 25% of my CPU for about 20 mins! The input files are quite big (approx 30,000 lines). Is there a more efficient way to do the above?

Thanks!
# 2  
Old 05-05-2009
First, for future reference, please put your formatted code inside [code][/code] tags

Second, there are quite a few things wrong with your code.
Code:
open(DAT3, $input_file3) || print "Could not open count file!";
@raw_data3=<DAT3>;
close(DAT3);

Instead of reading the whole file at once, process it line by line. This will save you a huge amount of memory and time (since the OS won't have to allocate that memory)

Code:
($VAR,$Name,$S1,$F1,$S2,$F2,$S3,$F3,$S4,$F4,$S5,$F5,$S6,$F6,$S7,$F7,$S8,$F8,$S9,$F9,$S10,$F10,$S11,$ F11,$S12,$F12,$S13,$F13,$S14,
$F14,$S15,$F15,$S16,$F16,$S17,$F17,$S18,$F18,$S19,$F19,$S20,$F20,$S21,$F21,$S22,$F22,$S23,$F23,$S24, $F24)=split(/,/,$line);

Why don't you just split into an array? That way your code would still work if you ever need more fields, without needing a rewrite.

Code:
$suc = $S1 + $S2 + $S3 + $S4 + $S5 + $S6 + $S7 + $S8 + $S9 + $S10 +$S11 + $S12 + $S13 + $S14 + $S15 + $S16 + $S17 + $S18 + $S19
+ $S20 + $S21 + $S22 + $S23 + $S24;

$fail = $F1 + $F2 + $F3 + $F4 + $F5 +$F6 + $F7 + $F8 + $F9 + $F10 + $F11 + $F12 + $F13 + $F14 + $F15 + $F16 + $F17 + $F18 +$F19 +
$F20 + $F21 + $F22 +$F23 +$F24;

See above, with an array those could be reduced to two for loops (for maintainability)

Code:
`echo "$CC,$Name,$suc,$fail" >> $tmp_path/suc_and_fail`

This way, Perl has to create a shell process which runs echo, has to open the file for appending, and close it again. If you open the file inside Perl before you start processing, write directly to it, and close it afterwards you'll probably shave off even more seconds.
# 3  
Old 05-05-2009
Or you can give awk a try: Smilie

Code:
awk -F, '{
for(i=3;i<=NF;i++) {
  if(i%2){s1+=$i} else{s2+=$i}}
}
{ print $1"," $2","s1","s2;s1=s2=0
}' file

# 4  
Old 05-05-2009
Many thanks for your response, point noted on the code tags, your post is much more readable than mine!!

How would I go about processing that file one line at a time rather than reading it all in at once?

Thanks Again
# 5  
Old 05-05-2009
Simple put:
Code:
open $fh, "file" or die "Couldn't open file: $!";
while($line = <$fh>){
    chomp $line;
    # Do whatever you have to
}
close $fh;

# 6  
Old 05-05-2009
Code:
use strict;
use warnings;
my $tmp_path = 'path/to/file';
my $results_path = 'path/to/file';
my ($suc,$fail) = (0,0);
my $CC = 'whatever';
my $input_file3 = "$results_path/count_file.csv";
open(my $IN, "<", $input_file3) or die "Could not open count file: $!";
open(my $OUT, ">", "$tmp_path/suc_and_fail") or die "Could not open suc_and_fail file: $!"; 
while (my $line = <$IN>){
   chomp($line);
   my @t = split(/,/,$line);
   next if ($t[0] eq "ID" || $t[0] eq "END");
   for (my $i = 2; $i < $#t; $i+=2){
      $suc += $t[$i];
   }
   for (my $j = 3; $j <= $#t; $j+=2){
      $fail += $t[$j];
   }
   print $OUT "$CC,$t[1],$suc,$fail\n";
}

# 7  
Old 05-05-2009
@kevin, your sum for fail seems different with awk result of franklin. pls confirm.

@OP , if Perl is not a must, here's an alternative with Python
Code:
#!/usr/bin/python
cc="whatever"
for line in open("file"):
    if not ( line.startswith("ID") or  line.startswith("END") ):
        line=line.strip().split(",")
        tag,rest = line[:2],line[2:]
        print "%s,%s,%s,%s" % (cc,','.join(tag), sum(map(int,rest[0::2])),sum(map(int,rest[1::2])) )

output:
Code:
# ./test.py
whatever,2,Andrew,164,133
whatever,4,James,52,520


Last edited by ghostdog74; 05-05-2009 at 10:49 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

PERL: In a perl-scripttTrying to execute another perl-script that SETS SOME VARIABLES !

I have reviewed many examples on-line about running another process (either PERL or shell command or a program), but do not find any usefull for my needs way. (Reviewed and not useful the system(), 'back ticks', exec() and open()) I would like to run another PERL-script from first one, not... (1 Reply)
Discussion started by: alex_5161
1 Replies

2. Shell Programming and Scripting

Incredibly inefficient cat | grep script

Hi there, I have 2 files that I am trying to work on. File 1 contains a reference list of unique subscriber numbers ( 7 million entries in total) File 2 contains a list of the subscriber numbers and their tariff (15 million entries in total). This file is in the production system and... (12 Replies)
Discussion started by: Cludgie
12 Replies

3. UNIX for Dummies Questions & Answers

Getting the sum

I am trying to get the sum of the first column of a file. When I use the same method for other files it works just fine... for some reason for the file below it gives me an error that I don't understand... I tried looking at different lines of the file and tried different things, but I still... (7 Replies)
Discussion started by: cosmologist
7 Replies

4. Solaris

How to Sum

Hi I need to incorporate a 'sum' as follows into a script and not sure how. I have a variable per line and I need them to be summed, e.g below 1 23 1,456 1 1 34 46 How do I calculate the sum of all these numbers to ouptut the answer ( 1,562) Thanks in advance (3 Replies)
Discussion started by: rob171171
3 Replies

5. Shell Programming and Scripting

PERL : Group & Sum in hash

Hi, I have a hash which is to be populated by reading data lines from a flat file. I am supposed to read fields 1-5 from the file and load them on to the hash such that fields 1-4 are going to be the hash key-set and field 5 is the hash value. Field 5 is a monetary amount and is supposed... (1 Reply)
Discussion started by: sinpeak
1 Replies

6. Shell Programming and Scripting

perl sum 2nd field in an array

Hi Everyone, ($total+=$_) for @record; assume @record=(1,2,3), so the result is 6. if @record=("1 3","2 3","3 3"), would like to sum up the 2nd field of this array, the result is 9. i tried " ($total+=$) for @record ", cannot, please advice. Thanks ---------- Post updated at 03:45... (1 Reply)
Discussion started by: jimmy_y
1 Replies

7. Shell Programming and Scripting

Print sum and relative value of the sum

Hi i data looks like this: student 1 Subject1 45 55 Subject2 44 55 Subject3 33 44 // student 2 Subject1 45 55 Subject2 44 55 Subject3 33 44 i would like to sum $2, $3 (marks) and divide each entry in $2 and $3 with their respective sums and print for each student as $4 and... (2 Replies)
Discussion started by: saint2006
2 Replies

8. Shell Programming and Scripting

Perl script to find particular field and sum it

Hi, I have a file with format a b c d e 1 1 2 2 2 1 2 2 2 3 1 1 1 1 2 1 1 1 1 4 1 1 1 1 6 in column e i want to find all similar fields ( with perl script )and sum it how many are there for instance in format above. 2 - 2 times 4 - 1 time 6 - 1 time what i use is ... (14 Replies)
Discussion started by: Learnerabc
14 Replies

9. Shell Programming and Scripting

Sum value from selected lines script (awk,perl)

Hello. I face this (2 side) problem. Some lines with this structure. ........... 12345678 4 12345989 13 12346356 205 12346644 74 12346819 22 ......... The first field (timestamp) is growing (or at least equal). 1)Sum the second fields if the first_field/500 are... (8 Replies)
Discussion started by: paolfili
8 Replies

10. Shell Programming and Scripting

sum

Hello everyone I need to write a script that sums numbers passed to it as arguments on the command line and displays the results. I must use a for loop and then rewrite it using a while loop. It would have to output something like 10+20+30=60 this is what I have so far fafountain@hfc:~$ vi sum... (1 Reply)
Discussion started by: Blinky85
1 Replies
Login or Register to Ask a Question