Word count of values in a column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Word count of values in a column
# 1  
Old 06-12-2012
Word count of values in a column

Hi friends,

I have an input file of the following format

Code:
a b c 1.11112
d e f 4.5767
g h i 19.098
k i l 87.9999

I am looking for an awk one liners that would help me in giving the following output

output.txt
Code:
Range of the column: 1.11112 to 87.9999
Total records between 1 and 10 - 2
Total records between 10 and 20 - 1
Total records between 20 and 30 -0
Total records between 30 and 40 -0
Total records between 40 and 50 -0
Total records between 50 and 60 -0
Total records between 60 and 70 -0
Total records between 70 and 80 -0
Total records between 80 and 90 -1

I want to know the total no. of records in the input file at 10 interval.

Thanks
# 2  
Old 06-12-2012
your edges are ambiguous. 1-10 then 10-20. which would 10 go in?
This User Gave Thanks to neutronscott For This Post:
# 3  
Old 06-12-2012
Quote:
Originally Posted by neutronscott
your edges are ambiguous. 1-10 then 10-20. which would 10 go in?
Thank you.

That was a very good question.

Here goes my output.txt

Code:
Range of the column: 1.11112 to 87.9999
Total records between 1 and 10.99 - 2
Total records between 11 and 20.99 - 1
Total records between 21 and 30.99 -0
Total records between 31 and 40.99 -0
Total records between 41 and 50.99 -0
Total records between 51 and 60.99 -0
Total records between 61 and 70.99 -0
Total records between 71 and 80.99 -0
Total records between 81 and 90.99 -1

# 4  
Old 06-12-2012
Code:
[mute@geek ~/temp/jacobs.smith]$ awk 'NR==1{min=$4}$4<min{min=$4}$4>max{max=$4}{a[int($4/10)]++}END{printf("Range of the column: %f to %f\n",min,max);max=int(max/10);for(i=0;i<=max;i++)printf("Records between [%d, %d): %d\n",i*10,10+i*10,a[i])}' input
Range of the column: 1.111120 to 87.999900
Records between [0, 10): 2
Records between [10, 20): 1
Records between [20, 30): 0
Records between [30, 40): 0
Records between [40, 50): 0
Records between [50, 60): 0
Records between [60, 70): 0
Records between [70, 80): 0
Records between [80, 90): 1

This User Gave Thanks to neutronscott For This Post:
# 5  
Old 06-12-2012
Quote:
Originally Posted by neutronscott
Code:
[mute@geek ~/temp/jacobs.smith]$ awk 'NR==1{min=$4}$4<min{min=$4}$4>max{max=$4}{a[int($4/10)]++}END{printf("Range of the column: %f to %f\n",min,max);max=int(max/10);for(i=0;i<=max;i++)printf("Records between [%d, %d): %d\n",i*10,10+i*10,a[i])}' input
Range of the column: 1.111120 to 87.999900
Records between [0, 10): 2
Records between [10, 20): 1
Records between [20, 30): 0
Records between [30, 40): 0
Records between [40, 50): 0
Records between [50, 60): 0
Records between [60, 70): 0
Records between [70, 80): 0
Records between [80, 90): 1

How wil this split the edge?
# 6  
Old 06-12-2012
[] meaning including, () meaning not included. i think you wanted everything 1 higher. in that case, i suppose you'd subtract one first here: a[int(($4-1)/10)]++ and adjust the printf parameters.

edit: like this

Code:
#!/usr/bin/awk -f
NR==1{min=$4}$4<min{min=$4}$4>max{max=$4}{a[int(($4-1)/10)]++}
END{
        printf("Range of the column: %f to %f\n", min, max);
        max=int(max/10)
        for (i=0;i<=max;i++)
                printf("Records between %d and %.2f: %d\n",1+i*10,10.99+i*10,a[i])
}

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv abc.csv- UTF-8,,,,,,,,,,,,,,,,,,,,,,,,, ... (6 Replies)
Discussion started by: Tahir_M
6 Replies

2. Shell Programming and Scripting

Count number of unique values in each column of array

What is an efficient way of counting the number of unique values in a 400 column by 1000 row array and outputting the counts per column, assuming the unique values in the array are: A, B, C, D In other words the output should look like: Value COL1 COL2 COL3 A 50 51 52... (16 Replies)
Discussion started by: Geneanalyst
16 Replies

3. UNIX for Beginners Questions & Answers

UNIX script to check word count of each word in file

I am trying to figure out to find word count of each word from my file sample file hi how are you hi are you ok sample out put hi 1 how 1 are 1 you 1 hi 1 are 1 you 1 ok 1 wc -l filename is not helping , i think we will have to split the lines and count and then print and also... (4 Replies)
Discussion started by: mirwasim
4 Replies

4. Shell Programming and Scripting

Count frequency of unique values in specific column

Hi, I have tab-deliminated data similar to the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to count the frequency of each individual "unique" value in the 1st column. Thus, the desired output would be as follows: dot 3 cat 1 hot 1 is... (5 Replies)
Discussion started by: owwow14
5 Replies

5. Shell Programming and Scripting

Count specific column values

Hi all: quick question! I have the following data that resembles some thing like this: i am tired tired am i what is up hello people cool I want to count (or at least isolate) all of the unique elements in the 2nd column. I have tried this: cut -f 2 | uniq 'input' which does... (3 Replies)
Discussion started by: owwow14
3 Replies

6. UNIX for Dummies Questions & Answers

count number of distinct values in each column with awk

Hi ! input: A|B|C|D A|F|C|E A|B|I|C A|T|I|B As the title of the thread says, I would need to get: 1|3|2|4 I tried different variants of this command, but I don't manage to obtain what I need: gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input ... (2 Replies)
Discussion started by: beca123456
2 Replies

7. UNIX for Dummies Questions & Answers

Count the lines with the same values in a column and write the output to a file

Hey everyone! I have a tab delimited data set which I want to create an output contained the calculation of number of those lines with a certain value in 2nd and 3rd column. my input file is like this: ID1 1 10M AAATTTCCGG ID2 5 4M ACGT ID3 5 8M ACCTTGGA ID4 5 ... (7 Replies)
Discussion started by: @man
7 Replies

8. UNIX for Dummies Questions & Answers

count number of rows based on other column values

Could anybody help with this? I have input below ..... david,39 david,39 emelie,40 clarissa,22 bob,42 bob,42 tim,32 bob,39 david,38 emelie,47 what i want to do is count how many names there are with different ages, so output would be like this .... david,2 emelie,2 clarissa,1... (3 Replies)
Discussion started by: itsme999
3 Replies

9. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

10. Shell Programming and Scripting

Word count of lines ending with certain word

Hi all, I am trying to write a command that can help me count the number of lines in the /etc/passwd file ending in bash. I have read through other threads but am yet to find one indicating how to locate a specifc word at the end of a line. I know i will need to use the wc command but when i... (8 Replies)
Discussion started by: warlock129
8 Replies
Login or Register to Ask a Question