Small changes below. I've assumed that regardless of the number of columns, the data to normalise is always in the next to last (NF-1) column. This handles the odd case of the "all" file without the need for a specific test.
I'm a bit confused with your new computation for "average." Your words say sum of all values divided by number of input files, but your example shows sum divide by 4. The code below computes the output based on your description and not the example and thus the output for the first record in the first sample file you gave is
Code:
a1 10 100 nameX 0 2 + 5.500
because 44 is divided by 2 input files, not 4. If that is wrong, where are you getting 4 from? It might be that in your testing you have two other input files that have all zeros in the n-1 column, and thus your example, and the code, is correct.
Small revisions....
Code:
#!/usr/bin/env ksh
awk '
{ # first pass to compute sums and total number of lines from all files
# we assume that data to snarf is always next to last column regardless
# of the number of columns in the input file.
if( !seen[FILENAME]++ ) # must now count input files here
nin++; # number of input files
sum[FILENAME] += $(NF-1); # sum across current file
tsum += $(NF-1); # sum across all files
tnv++; # total number of values
}
END {
statsf = "stats.out"; # stats output file name
#tmean = tsum/tnv; # mean of values across all files (unused)
tmean = tsum/nin; # not the mean anymore though we keep the original name
nin = 0; # number of input files
for( fn in seen ) # make second pass across the input files
{
printf( "%s sum = %.0f\n", fn, sum[fn] ) >statsf; # collect stats
ofn = sprintf( "%s.out", fn );
while( (getline < fn) > 0 )
{
nv = ($(NF-1)/sum[fn]) * tmean;
gsub( "\t+", " " );
gsub( " +", " " );
gsub( " ", "\t" );
printf( "%s\t%.3f\n", $0, nv ) >ofn; # write to output files
}
close( fn );
close( ofn );
}
printf( "mean across %d input files %.0f/%.0f = %.03f\n", nin, tsum, tnv, tsum/tnv ) >statsf;
}
' "$@"
exit
yes it should be 2 not 4. my mistake. the reason why ended up writing 4 is that my real data sets are 4.
---------- Post updated at 08:45 AM ---------- Previous update was at 08:40 AM ----------
And one personal question regarding awk. How come you write so well in awk ? How did you get in to awk and how did you practiced it ? I started with free online awk book. up to few chapters it was easy to follow and the very difficult to grasp the contents. Your suggestion could be really helpful to me. An thank you for the modifications!.
---------- Post updated at 09:12 AM ---------- Previous update was at 08:45 AM ----------
I think some thing wrong with mean in stats file. it should be
And one personal question regarding awk. How come you write so well in awk ? How did you get in to awk and how did you practiced it ? I started with free online awk book. up to few chapters it was easy to follow and the very difficult to grasp the contents. Your suggestion could be really helpful to me. An thank you for the modifications!.
Thanks.
I started using awk at some point in 1990 or 91. I bought the O'Reilly Sed & Awk book and went from there. Awk takes a while to wrap your head around, so don't give up. A great way to improve your skills is to look at the posted solutions on this forum. Try to solve the problem yourself, and use the posted solution(s) as a way to "check your answer." Also, having the answer can help if you just don't see how to solve the problem. Do remember that there may be lots of different approaches so your solution might not look like what was posted, but may still work.
---------- Post updated at 10:30 ---------- Previous update was at 10:24 ----------
Quote:
Originally Posted by quincyjones
---------- Post updated at 09:12 AM ---------- Previous update was at 08:45 AM ----------
I think some thing wrong with mean in stats file. it should be
Oops. Yep missed that one, and another small mistake earlier
Code:
#!/usr/bin/env ksh
awk '
{ # first pass to compute sums and total number of lines from all files
# we assume that data to snarf is always next to last column regardless
# of the number of columns in the input file.
if( !seen[FILENAME]++ )
nin++; # number of input files
sum[FILENAME] += $(NF-1); # sum across current file
tsum += $(NF-1); # sum across all files
tnv++; # total number of values
}
END {
statsf = "stats.out"; # stats output file name
#tmean = tsum/tnv; # mean of values across all files (unused)
tmean = tsum/nin; # not the mean anymore, we keep the original name
for( fn in seen ) # make second pass across the input files
{
printf( "%s sum = %.0f\n", fn, sum[fn] ) >statsf; # collect stats
ofn = sprintf( "%s.out", fn );
while( (getline < fn) > 0 )
{
nv = ($(NF-1)/sum[fn]) * tmean;
gsub( "\t+", " " );
gsub( " +", " " );
gsub( " ", "\t" );
printf( "%s\t%.3f\n", $0, nv ) >ofn; # write to output files
}
close( fn );
close( ofn );
}
printf( "mean across %d input files %.0f/%.0f = %.03f\n", nin, tsum, nin, tmean ) >statsf;
}
' "$@"
exit
Better now I hope!
Last edited by agama; 08-06-2011 at 02:07 PM..
Reason: pulled output to tty
Good evening, Im newbie at unix specially with awk
From an scheduler program called Autosys i want to extract some data reading an inputfile that comprises jobs names, then formating the output to columns for example
1.
This is the inputfile:
$ more MapaRep.txt
ds_extra_nikira_usuarios... (18 Replies)
Hi, there
Need help on rearranging the data.
I have data in the following format.
LAC
=
040
DN
=
24001001
EQN
=
920-
2-
0-
1
CAT
=
MS
OPTRCL (7 Replies)
Hello experts,
I'm stuck with this script for three days now. Here's what i need.
I need to split a large delimited (,) file into 2 files based on the value present in the last field.
Samp: Something.csv
bca,adc,asdf,123,12C
bca,adc,asdf,123,13C
def,adc,asdf,123,12A
I need this split... (6 Replies)
Hi,
I have a situation to compare one file, say file1.txt with a set of files in directory.The directory contains more than 100 files.
To be more precise, the requirement is to compare the first field of file1.txt with the first field in all the files in the directory.The files in the... (10 Replies)
Hi
I have many problems with a script. I have a script that formats a text file but always prints the same error when i try to execute it
The code is that:
{
if (NF==17){
print $0
}else{
fields=NF;
all=$0;
while... (2 Replies)
Hi All, I am having a file having below three lines or maybe more than 3 lines. The first line will be always constant.
### Line 1 ####
Transformation||Transformation Mapplet Name||Transformation Group||Partition Index||Transformation Row ID||Error Sequence||Error Timestamp||Error UTC... (4 Replies)
Actually I got a list of file end with *.txt
I want to use the same command apply to all the *.txt
Thus I try to find out the fastest way to write those same command in a script and then want to let them run automatics.
For example:
I got the file below:
file1.txt
file2.txt
file3.txt... (4 Replies)