print average of duplicates


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers print average of duplicates
# 1  
Old 11-09-2010
print average of duplicates

Hi
my inFile has 3 fields with duplicates in field1.
I would like to print the average field2 and field3 for the duplicated field1.

Code:
$cat inFile
f1    f2    f3
A    7    2
B    4    2
B    2    3
C    6    5
D    15    2
D    5    3
D    10    4

Code:
$cat outFile
f1    f2    f3
A    7    2
B    3    2.5
C    6    5
D    10    3

thank you for your help.
# 2  
Old 11-09-2010
nawk -f jd.awk inFile > outFile

jd.awk:
Code:
FNR==1 {header=$0;next}
{
  cntA[$1]++
  f2[$1]+=$2
  f3[$1]+=$3
}
END {
  print header
  for(i in cntA)
    print i, f2[i]/cntA[i], f3[i]/cntA[i]
}

# 3  
Old 11-09-2010
Well, first, the average of unique is unique / 1 .... You need an associative array or sorted data. Or, get a flat file db tool (Google) and run SQL
Code:
select f1, avg(f2), avg(f3) from $infile group by f1

ksh93 flavor:
Code:
(
 line # first line bypasses sort
 (
  sort 
  echo zzzEOF # add trailer to close loop
  ) | (
  zlast= zct=0 f2s=0.0 f3s=0.0
  while read f1 f2 f3
  do
   if [ "$f1" = "$zlast" -o "$f1" = "zzzEOF" ]
   then
    echo $zlast $(( $f2s/$zct )) $(( $f3s/$zct ))
    if [ "$f1" = "zzzEOF" ]
    then
     break
    fi
    zlast="$f1" zct=0 zf2s=0.0 zf3s=0.0
   fi
   (( zct++ ))
   (( zf2s += "$f2" ))
   (( zf3s += "$f3" ))
  done
  )
 ) < infile


Last edited by DGPickett; 11-09-2010 at 04:17 PM.. Reason: header divert around using line and subshell
# 4  
Old 11-09-2010
Quote:
Originally Posted by vgersh99
nawk -f jd.awk inFile > outFile

jd.awk:
Code:
FNR==1 {header=$0;next}
{
  cntA[$1]++
  f2[$1]+=$2
  f3[$1]+=$3
}
END {
  print header
  for(i in cntA)
    print i, f2[i]/cntA[i], f3[i]/cntA[i]
}


I tested the code with the following inFile:

Code:
#f1    f2    f2
a    3991037    4155442
a    3993760    4160837
a    3994154    4159990
b    308568    179762
f    3484774    3488370
f    3600005    3666058

outFile:
Code:
#f1    f2    f2
a    3992980    4158760
b    308568    179762
f    3542390    3577214

I noticed that the average for a should be 3992984 and not 3992980 in field2.
The same for field3, it should be 4158756 and not 4158760
Why is that?
Thanks
# 5  
Old 11-09-2010
No fractions?
# 6  
Old 11-09-2010
Quote:
Originally Posted by DGPickett
No fractions?
I am not really sure I understand the question.
can you please explain a bit more?
Thanks
Joseph
# 7  
Old 11-09-2010
try this - with no fractions:
Code:
FNR==1 {header=$0;next}
{
  cntA[$1]++
  f2[$1]+=$2
  f3[$1]+=$3
}
END {
  print header
  for(i in cntA)
    printf("%s%s%.0f%s%.0f\n", i, OFS,f2[i]/cntA[i], OFS, f3[i]/cntA[i])
}

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

print average of values

Is it possible to print the average of 2 nd column based on a key in 1st col input a1X 4 a1X 6 a2_1 10 a2_1 20 a2_1 30 a2_1 30 a2_1 10 output a1X 5 a2_1 20 (11 Replies)
Discussion started by: quincyjones
11 Replies

2. Shell Programming and Scripting

print running field average for a set of lines

Hi everyone, I have a program that generates logs that contains sections like this: IMAGE INPUT 81 0 0.995 2449470 0 1726 368 1 0.0635 0.3291 82 0 1.001 2448013 0 1666 365 1 0.0649 0.3235 83 0 1.009 2444822 0 1697 371 1 ... (3 Replies)
Discussion started by: euval
3 Replies

3. UNIX for Dummies Questions & Answers

Duplicates

Hi, How to eliminate the duplicate values in unix? I have a excel file which contains duplicate values. Need to use this in a script. Thanks in advance. (3 Replies)
Discussion started by: venkatesht
3 Replies

4. Shell Programming and Scripting

need Shell script for Sort BASED ON FIRST FIELD and PRINT THE WHOLE FILE WITHOUT DUPLICATES

Can some one provide me a shell script. I have file with many columns and many rows. need to sort the first column and then remove the duplicates records if exists.. finally print the full data with first coulm as unique. Sort BASED ON FIRST FIELD and remove the duplicates if exists... (2 Replies)
Discussion started by: tuffEnuff
2 Replies

5. Shell Programming and Scripting

Search Duplicates, Print Line #

Masters, I have a text file in the following format. vrsonlviee RVEBAALSKE lyolzteglx UUOSIWMDLR pcybtapfee DKGFJBHBJO ozhrucfeau YQXATYMGJD cjwvjolrcv YDHALRYQTG mdukphspbc CQZRIOWEUB nbiqomzsgw DYSUBQSSPZ xovgvkneav HJFQQYBLAF boyyzdmzka BVTVUDHSCR vrsonlviee TGTKUCUYMA... (2 Replies)
Discussion started by: genehunter
2 Replies

6. Shell Programming and Scripting

using awk to print average and standard deviation into a file

Hi I want to use awk to print avg and st deviation but it does not go into a file for column 1 only. I can do average and # of records but i cannot get st deviation. awk '{sum+=$1} END { print "Average = ",sum/NR}' thanks (1 Reply)
Discussion started by: phil_heath
1 Replies

7. Shell Programming and Scripting

print the nuber of duplicates

input a a a b b c output a 3 b 2 c 1 Could you guyz help me. I tried uniq and wc -l to count uniq line (3 Replies)
Discussion started by: repinementer
3 Replies

8. Shell Programming and Scripting

Non Duplicates

I have input file like below. I00789524 0213 5212 D00789524 0213 5212 I00778787 2154 5412 The first two records are same(Duplicates) except I & D in the first character. I want non duplicates(ie. 3rd line) to be output. How can we get this . Can you help. Is there any single AWK or SED... (3 Replies)
Discussion started by: awk_beginner
3 Replies

9. HP-UX

getting duplicates

how to get duplicates in a file containing data in columns using command or scripting? (4 Replies)
Discussion started by: megh
4 Replies

10. UNIX for Dummies Questions & Answers

average value

If I have a file like this, could anyone please guide me how to find the average value in each metrix. The file has got about 130,000 metrixs. Grid-ref= 142, 235 178 182 203 240 273 295 289 293 283 262 201 176 167 187 187 246 260 282 299 312 293 276 230 191 169 ... (2 Replies)
Discussion started by: su_in99
2 Replies
Login or Register to Ask a Question