02-28-2009
Getting Sum, Count and Distinct Count of a file
Hi all this is a UNIX question.
I have a large flat file with millions of records.
col1|col2|col3
1|a|b
2|c|d
3|e|f
3|g|h
footer****
I am supposed to calculate the sum of col1 1+2+3+3=9, count of col1 1,2,3,3=4, and distinct count of col1 1,2,3=c3
I would like it if you avoid external commands like AWK. Also, can we do the same by creating a function?
Please bear in mind that the file is huge
Thanks in advance
Last edited by Franklin52; 02-28-2009 at 06:08 AM..
Reason: urls removed
10 More Discussions You Might Find Interesting
1. UNIX for Advanced & Expert Users
Hello guys,
I have a file in the following format(each line seperated by TAB):
=========
Filename id
Filename id1
Filename id
Filename1 id7
Filename1 id7
Filename2 id1
Filename2 id1
Filename2 id3
Filename3 id2
Filename3 id4
Filename3 id4
Filename3 id6
=========
I would like to... (2 Replies)
Discussion started by: jingi1234
2 Replies
2. Shell Programming and Scripting
I have . dat file which contains data in a specific format:
0 3 892 921 342
1 3 921 342 543
2 4 817 562 718 765
3 3 819 562 717 761
i need to compare each field in a row with another field of the same column but different row and cont the... (8 Replies)
Discussion started by: Abhik
8 Replies
3. Shell Programming and Scripting
Dear friends,
I'm stuck with the task below, I would be thankful for all your replies.
INPUT :
Date Price Volume
20110601 73052811.61 2845833
20110602 61489062.96 9909230
20110603 72790724.65 1108927
20110606 48299507.20 7435881
20110607 ... (5 Replies)
Discussion started by: hernand
5 Replies
4. Shell Programming and Scripting
I have a input.txt file which have 3 fields separate by a comma
place, os and timediff in seconds
tampa,win7, 2575
tampa,win7, 157619
tampa,win7, 3352
dallas,vista,604799
greenbay,winxp, 14400
greenbay,win7 , 518400
san jose,winxp, 228121
san jose,winxp, 70853
san jose,winxp, 193514... (5 Replies)
Discussion started by: sabercats
5 Replies
5. UNIX for Dummies Questions & Answers
Hi !
input:
A|B|C|D
A|F|C|E
A|B|I|C
A|T|I|B
As the title of the thread says, I would need to get:
1|3|2|4
I tried different variants of this command, but I don't manage to obtain what I need:
gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input
... (2 Replies)
Discussion started by: beca123456
2 Replies
6. Shell Programming and Scripting
Hi all;
Here is my file:
V1.3=4
V1.4=5
V1.1=3
V1.2=6
V1.3=6
Please, can you help me to write a script shell that counts the sum of values in my file (4+5+3+6+6) ?
Thank you so much for help.
Kind regards. (3 Replies)
Discussion started by: chercheur111
3 Replies
7. Shell Programming and Scripting
Hi,
I have a .dat file with contents like the below:
Input file
============SEQ NO-1: COLUMN1==========
9835619
7152815
============SEQ NO-2: COLUMN2 ==========
7615348
7015548
9373086
============SEQ NO-3: COLUMN3===========
9373086
Expected Output: (I just... (1 Reply)
Discussion started by: MS06
1 Replies
8. Shell Programming and Scripting
Hi All ,
I have multiple pipe delimited csv files are present in a directory.I need to find out distinct count on a column on those files and need the total distinct
count on all files.
We can't merge all the files here as file size are huge in millions.I have tried in below way for each... (9 Replies)
Discussion started by: STCET22
9 Replies
9. UNIX for Beginners Questions & Answers
Hi,
Sure it's an easy one, but it drives me insane.
input ("|" separated):
1|A,B,C,A
2|A,D,D
3|A,B,B
I would like to count the occurence of each capital letters in $2 across the entire file, knowing that duplicates in each record count as 1.
I am trying to get this output... (5 Replies)
Discussion started by: beca123456
5 Replies
10. UNIX for Beginners Questions & Answers
I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv
abc.csv-
UTF-8,,,,,,,,,,,,,,,,,,,,,,,,,
... (6 Replies)
Discussion started by: Tahir_M
6 Replies
LEARN ABOUT DEBIAN
lire::sum
Sum(3pm) LogReport's Lire Documentation Sum(3pm)
NAME
Lire::Sum - Lire class that implements the sum operator
SYNOPSIS
use Lire::Sum
DESCRIPTION
Class that implements the sum operator. This operator will compute the field's sum in a group of DLF records.
It's possible to compute a weighted sum in which each value is first multiplied by the value of another DLF field.
Its also possible to express the count as a ratio of the total count for the group or table.
METHODS
new( %params )
Creates a new Lire::Count object. In addition to the values supported by its parents, the weight and ratio attributes will be initialized
to the values specified in the %params argument.
weight( [$new_weight] )
Returns the DLF field's name by which the values will be multiplied before being summed.
You can change the weight field by specifying a new name as the $new_weight parameter. Use undef to remove the use of a weighting field.
ratio([$new_ratio])
Returns how the sum will be expressed. This can one of three possible values:
none
Default. The absolute sum will be used.
group
The sum will be expressed as a percentage of the group's sum.
table
The sum will be expressed as a percentage of the table's total sum.
SEE ALSO
zLire::ReportSpec(3pm), Lire::ReportOperator(3pm), Lire::Aggregator(3pm), Lire::Aggregate(3pm).
AUTHOR
Francis J. Lacoste <flacoste@logreport.org>
VERSION
$Id: Sum.pm,v 1.17 2008/03/09 19:27:31 vanbaal Exp $
COPYRIGHT
Copyright (C) 2001, 2002 Stichting LogReport Foundation LogReport@LogReport.org
This file is part of Lire.
Lire is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program (see COPYING); if not, check with
http://www.gnu.org/copyleft/gpl.html.
Lire 2.1.1 2008-03-09 Sum(3pm)