Getting Sum, Count and Distinct Count of a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Getting Sum, Count and Distinct Count of a file
# 1  
Old 02-28-2009
Getting Sum, Count and Distinct Count of a file

Hi all this is a UNIX question.

I have a large flat file with millions of records.
col1|col2|col3
1|a|b
2|c|d
3|e|f
3|g|h
footer****

I am supposed to calculate the sum of col1 1+2+3+3=9, count of col1 1,2,3,3=4, and distinct count of col1 1,2,3=c3

I would like it if you avoid external commands like AWK. Also, can we do the same by creating a function?

Please bear in mind that the file is huge

Thanks in advance

Last edited by Franklin52; 02-28-2009 at 06:08 AM.. Reason: urls removed
# 2  
Old 02-28-2009
Is there any reason to avoid external commands? Is this a homework question?

Regards
# 3  
Old 02-28-2009
MySQL Partial Solution

Hi i can solve your first 2 requirement the third i hope some one will post the solution ... will the solution for first 2 options follows
------------------------------
NumOfColumn=0
SumOfColumn=0
while read line
do
var1=`echo $line | cut -d"|" -f1`
SumOfColumn=`expr $SumOfColumn + $var1`
NumOfColumn=`expr $NumOfColumn + 1`
done < larg_file.txt

echo -e "SumOfColumn=$SumOfColumn\nNumOfColumn=$NumOfColumn"
#This will work Smilie
---------------------------------

for third option you have to read sort or uniq command in book
# 4  
Old 03-02-2009
Code:
nawk 'BEGIN{FS="|"}
{
split($0,arr,"|")
sum+=arr[1]
n++
_[arr[1]]=1
}
END{
for(i in _)
 m++
print "Sum:"sum
print "Cnt:"n
print "Dis:"m
}' file

# 5  
Old 03-02-2009
Are you people stupid? The forum moderator asked if this was a homework question but then you go and post the solution. I think I answered my own question.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv abc.csv- UTF-8,,,,,,,,,,,,,,,,,,,,,,,,, ... (6 Replies)
Discussion started by: Tahir_M
6 Replies

2. UNIX for Beginners Questions & Answers

Awk: count unique elements in a field and sum their occurence across the entire file

Hi, Sure it's an easy one, but it drives me insane. input ("|" separated): 1|A,B,C,A 2|A,D,D 3|A,B,B I would like to count the occurence of each capital letters in $2 across the entire file, knowing that duplicates in each record count as 1. I am trying to get this output... (5 Replies)
Discussion started by: beca123456
5 Replies

3. Shell Programming and Scripting

Finding total distinct count from multiple csv files through UNIX script

Hi All , I have multiple pipe delimited csv files are present in a directory.I need to find out distinct count on a column on those files and need the total distinct count on all files. We can't merge all the files here as file size are huge in millions.I have tried in below way for each... (9 Replies)
Discussion started by: STCET22
9 Replies

4. Shell Programming and Scripting

Help with Getting distinct record count from a .dat file using UNIX command

Hi, I have a .dat file with contents like the below: Input file ============SEQ NO-1: COLUMN1========== 9835619 7152815 ============SEQ NO-2: COLUMN2 ========== 7615348 7015548 9373086 ============SEQ NO-3: COLUMN3=========== 9373086 Expected Output: (I just... (1 Reply)
Discussion started by: MS06
1 Replies

5. Shell Programming and Scripting

Script Shell: Count The sum of numbers in a file

Hi all; Here is my file: V1.3=4 V1.4=5 V1.1=3 V1.2=6 V1.3=6 Please, can you help me to write a script shell that counts the sum of values in my file (4+5+3+6+6) ? Thank you so much for help. Kind regards. (3 Replies)
Discussion started by: chercheur111
3 Replies

6. UNIX for Dummies Questions & Answers

count number of distinct values in each column with awk

Hi ! input: A|B|C|D A|F|C|E A|B|I|C A|T|I|B As the title of the thread says, I would need to get: 1|3|2|4 I tried different variants of this command, but I don't manage to obtain what I need: gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input ... (2 Replies)
Discussion started by: beca123456
2 Replies

7. Shell Programming and Scripting

awk and count sum ?

I have a input.txt file which have 3 fields separate by a comma place, os and timediff in seconds tampa,win7, 2575 tampa,win7, 157619 tampa,win7, 3352 dallas,vista,604799 greenbay,winxp, 14400 greenbay,win7 , 518400 san jose,winxp, 228121 san jose,winxp, 70853 san jose,winxp, 193514... (5 Replies)
Discussion started by: sabercats
5 Replies

8. Shell Programming and Scripting

sum divided by count

Dear friends, I'm stuck with the task below, I would be thankful for all your replies. INPUT : Date Price Volume 20110601 73052811.61 2845833 20110602 61489062.96 9909230 20110603 72790724.65 1108927 20110606 48299507.20 7435881 20110607 ... (5 Replies)
Discussion started by: hernand
5 Replies

9. Shell Programming and Scripting

To count distinct fields in a row

I have . dat file which contains data in a specific format: 0 3 892 921 342 1 3 921 342 543 2 4 817 562 718 765 3 3 819 562 717 761 i need to compare each field in a row with another field of the same column but different row and cont the... (8 Replies)
Discussion started by: Abhik
8 Replies

10. UNIX for Advanced & Expert Users

Count the distinct list of ids

Hello guys, I have a file in the following format(each line seperated by TAB): ========= Filename id Filename id1 Filename id Filename1 id7 Filename1 id7 Filename2 id1 Filename2 id1 Filename2 id3 Filename3 id2 Filename3 id4 Filename3 id4 Filename3 id6 ========= I would like to... (2 Replies)
Discussion started by: jingi1234
2 Replies
Login or Register to Ask a Question