summing numbers in files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting summing numbers in files
# 15  
Old 07-31-2008
Unfortunately awk's precision (even in GNU awk) is unreliable at 13 decimal places; perl seems to do a better job:

Code:
perl -ne '
        BEGIN { $nr=1; $max=1; };
        chomp;
        $tot[$nr]+=$_;
        if ($nr > $max) { $max=$nr };
        if (eof(ARGV)) { $nr=1; } else { $nr++; }
        END { for ($i=1;$i<=$max;$i++) { print "$tot[$i]\n" } }
' file1 file2 ...

I am not strong in perl-fu so I'm sure this could be done much more neatly... there must be a better way to count $nr but the built-in $. facility for this is breaks in this type of perl -n script, and the documented $NR and $INPUT_LINE_NUMBER variables don't seem to work at all.
# 16  
Old 08-01-2008
Quote:
Originally Posted by aigles
The following script paste and sum recursivelyfiles by group of 5 (GroupingFactor).
Work files are created in /tmp (WorkFilePrefix).
Code:
GroupingFactor=5
WorkFilePrefix=/tmp/work.$$

sum() {
   local    _inputFileList=$1
   local -i _level=$((${2:-0} + 1))

   local _workFilePrefix=${WorkFilePrefix}.${_level}
   local _outFileCount=0
   local _outFileList=${_workFilePrefix}.filelist
   local _fileList _outFile

   > ${_outFileList}
   xargs -a $_inputFileList -n 5 |
   while read _fileList
   do
      _outFile=${_workFilePrefix}.$((++_outFileCount))
      paste $_fileList |
      awk '{ sum=0 ; for(i=1; i<=NF; i++) sum += $i ; print sum}' > $_outFile
      echo $_outFile >> $_outFileList
      (( _level > 1 )) && rm -f $_fileList
   done

   if (( $(wc -l < ${_outFileList}) > 1 ))
   then
      sum $_outFileList $_level
   else
      cat $(<${_outFileList})
   fi
   rm -f ${_workFilePrefix}.*
}

if [ -z "$1" ]
then
   in=${WorkFilePrefix}.0.filelist
   cat - >$in
   sum $in
   rm -f $in
else
   sum $1
fi

The input files (50 files with same content) :
Code:
> ls pattywac_datas/
file.1   file.13  file.17  file.20  file.24  file.28  file.31  file.35  file.39  file.42  file.46  file.5   file.8
file.10  file.14  file.18  file.21  file.25  file.29  file.32  file.36  file.4   file.43  file.47  file.50  file.9
file.11  file.15  file.19  file.22  file.26  file.3   file.33  file.37  file.40  file.44  file.48  file.6
file.12  file.16  file.2   file.23  file.27  file.30  file.34  file.38  file.41  file.45  file.49  file.7
> cat pattywac_datas/file.1
1
2
3
4
5
6
7
8
9
10
> ls -l /tmp
total 0
>

Summing files:
Code:
> ls pattywac_datas/* | pattywac.sh
50
100
150
200
250
300
350
400
450
500
> ls -l /tmp
total 0
>

or
Code:
> pattywac.sh ./filelist
50
100
150
200
250
300
350
400
450
500
>

Jean-Pierre.
After a good night, a simplest version:
Code:
#! /usr/bin/bash
GroupingFactor=5
WorkFilePrefix=/tmp/work.$$

sum() {
   local    _inputFileList=$1

   local _outFileIndex=0
   local _fileList _inFile _outFile

   xargs -a $_inputFileList -n 5 |
   while read _fileList
   do
      (( _outFileIndex = 1 - _outFileIndex ))
      _outFile=${WorkFilePrefix}.${_outFileIndex}
      paste ${_inFile} ${_fileList} |
      awk '{ sum=0 ; for(i=1; i<=NF; i++) sum += $i ; print sum}' > ${_outFile}
      rm -f $_inFile
      _inFile=${_outFile}
   done

   cat ${WorkFilePrefix}.[01]
   rm -f ${WorkFilePrefix}.[01]
}

if [ -z "$1" ]
then
   in=${WorkFilePrefix}.0.filelist
   cat - >$in
   sum $in
   rm -f $in
else
   sum $1
fi

Jean-Pierre.
# 17  
Old 08-01-2008
Here's a Perl rehash of my earlier one-liner.

Code:
paste all those files | perl -lane '$sum = 0; map { $sum += $_ } @F; print $sum'

I'd be interested in hearing how it performs in comparison with other suggestions here.
# 18  
Old 08-01-2008
@era paste will choke on 40401 files
a similar solution will be:
Code:
paste -d '+' $( ls -m file* | tr -d ',' ) | bc >> output

# 19  
Old 08-01-2008
Depends on your version of paste I suppose, and of course, you might run into that dreaded old ARG_MAX limit. Your ls is also prone to hitting the latter. Otherwise, pretty cool idea (though I wonder about the use of ls -m -- shouldn't just paste -d '+' file* work just as well?)

Either way, you could divide it up into, say, batches of 10,000, and then recursively add the output files from that.
# 20  
Old 08-01-2008
Quote:
Originally Posted by danmero
@era paste will choke on 40401 files
a similar solution will be:
Code:
paste -d '+' $( ls -m file* | tr -d ',' ) | bc >> output

with 40401 files, doesn't ls -m hit argument list too long if ARG_MAX is not set to handle that many?
# 21  
Old 08-01-2008
ARG_MAX is calculated in bytes, so it also depends on how long the file names are! Anyway, on many modern systems, ARG_MAX is bigger than half a meg (off the top of my head, suppose file names are twelve characters on average; 12 x 40,401 works out to be 484,812) but if you have an explicit directory path for each file, that will quickly make the command line too long. This is one of the few situations where it really does make sense to cd to the directory where you have your files before you run the actual command (^:

As noted above, I still believe the ls -m is a Useless Use of ls
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Summing up values of rows of numbers

data file contains failed=24 error=23 error=163 failed=36 error=903 i need to get a total count of each value above. i'm looking for the most efficient method to do this as the datafile i provided is just a sample. the actual data can be several hundred thousands of lines. so from... (3 Replies)
Discussion started by: SkySmart
3 Replies

2. Shell Programming and Scripting

Sum of numbers in three or more files

I have files : cat file1 15 88 44 667 33 4cat file2 445 66 77 3 56 (12 Replies)
Discussion started by: Natalie
12 Replies

3. Shell Programming and Scripting

How to take a Average of numbers from different files?

Hi, I have 3 to 4 different files, from that i need to take a Average of numbers from a particular column. here i have to take 4th column, that should present in diff. file. File 1: Col1 col2 col3 col4 1 11 sa 12.00 2 22 sb 134.59 3 33 sc 11.99 4 44 sd 12.44 Col1 col2 col3... (8 Replies)
Discussion started by: Shenbaga.d
8 Replies

4. Shell Programming and Scripting

summing from two different files

I have two files hhhh 3674.00 a 75 1535 183 2134 291 2452 442 2738 704 3048 a 1007 3549 1282 4413 1494 5001 1631 5217 1954 5610 a 2540 5832 3248 6080 3629 6264 4851 6600 7004 6985 ... (4 Replies)
Discussion started by: Indra2011
4 Replies

5. Shell Programming and Scripting

Multiply numbers from different files

Hi All, I have tried few things with this but it did not help much. I have some 200,000 files in a directory. There are two sets of files. 1. Files with extension .dat with file names like these (1.dat, 2.dat, 5.dat, 8.dat....200000.dat) 2. Another set of files with .txt extension and... (5 Replies)
Discussion started by: shoaibjameel123
5 Replies

6. Shell Programming and Scripting

Summing numbers after specific word

Hi all, Looking for suggestions on a better way to sum numbers in a key value pair formated file. What I have works but seems really clunky to me. Any suggestions would be greatly appreciated. cat test.txt | perl -ne 'm/(M=)(\d+\.?\d?\d?)/ && print "$2\n"' | awk '{ sum+=$1} END {printf... (7 Replies)
Discussion started by: cgol
7 Replies

7. Shell Programming and Scripting

Sum Numbers from different files

Hi All, I need to print the sum of numbers from different files. Input files: file1.out 10 20 30 file2.out 10 20 30 (5 Replies)
Discussion started by: saint2006
5 Replies

8. Shell Programming and Scripting

Reading several files and summing their content line-by-line

Hey, I am gettin a bit crazy with my script. I have several input datas with the same name (5.ill) in different folders (daysim_01, daysim_02, etc.). The 4. column of each of the data has to be summed with each other and then hass to be written in one new file. So file1: 1 1 0 1 2 1 1 2 ... (7 Replies)
Discussion started by: ergy1983
7 Replies

9. UNIX for Dummies Questions & Answers

To get unique numbers from two files

here i have two files: file 1 1 2 3 4 5 5 6 7 8 9 file 2 4 5 6 6 8 8 (6 Replies)
Discussion started by: i.scientist
6 Replies

10. Shell Programming and Scripting

altering numbers in files

I want to change a number in a file into number -1.. for instance file_input is fdisdlf_s35 fdjsk_s27 fsdf_s42 jkljllljkkl_s57 ... etc now i want the output to be fdisdlf_s34 fdjsk_s26 fdsf_s41 jkljllljkkl_s56 ... etc I was think of using "sed -e 's/2/1/g' -e 's/3/2/g' -e... (4 Replies)
Discussion started by: bigboizvince
4 Replies
Login or Register to Ask a Question