Finding the most common entry in a column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding the most common entry in a column
# 1  
Old 11-21-2007
Finding the most common entry in a column

Hi,

I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this.

e.g.

value1,value2,bob
value1,value2,bob
value1,value2,bob
value1,value2,dave
value1,value2,james

Clearly in the above example the most popular value in column3 is "bob", but how would I write a script to work this out?

Many thanks
# 2  
Old 11-21-2007
nawk -f don.awk myFile

don.awk:
Code:
BEGIN {
  FS=","
}
{a[$3]++; if (a[$3] > comV) { comN=$3; comV=a[$3]} }
END {
    printf("Most Common Name: [%s] = [%d]\n", comN, comV)
}


Last edited by vgersh99; 11-21-2007 at 12:08 PM..
# 3  
Old 11-21-2007
awk

Hi,
This one should also be ok for you. Actually, this case involved persormance issue, since your file has thousound and hunderds of lines. So difficult logic will have different result.

To be honest, i only know how to get the result, but i have no idea to give out a high-performance code. So you'd better ask some expert for help.

Here comes my code:

Code:
awk 'BEGIN{
FS=","
n=0
}
{
sum[$3]++
if (sum[$3]>n)
{
	n=sum[$3]
	m=$3
}
}
END{
print m
}' filename

# 4  
Old 11-21-2007
Hi.

So you're willing to accept a (more or less) random result of any of the winners if there is a tie among two or more names? ... cheers, drl
# 5  
Old 11-22-2007
It will be sufficient to turn comN/m into array.
# 6  
Old 11-22-2007
Thanks guys,

I got both of the above to work but my CPU usage hit 100% lol! Any ideas on either making this more efficient or limiting the amount of CPU that this awk script can hog?

Thanks again
# 7  
Old 11-22-2007
Hi, can u check this?

Code:
awk -F\, '{print $NF}' file|sort -u|xargs -i ksh -c 'echo "{} \c";grep -wc ",{}$" file'|sort -r -k 2,2|head -1|awk '{print $1}'

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk/sed summation of one column based on some entry in first column

Hi All , I am having an input file as stated below Input file 6 ddk/djhdj/djhdj/Q 10 0.5 dhd/jdjd.djd.nd/QB 01 0.5 hdhd/jd/jd/jdj/Q 10 0.5 512 hd/hdh/gdh/Q 01 0.5 jdjd/jd/ud/j/QB 10 0.5 HD/jsj/djd/Q 01 0.5 71 hdh/jjd/dj/jd/Q 10 0.5 ... (5 Replies)
Discussion started by: kshitij
5 Replies

2. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Hello, I need to find the intersection across 10 columns. Kindly help. my file (INPUT.csv) looks like this 4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies

3. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

4. Shell Programming and Scripting

Finding most common substrings

Hello, I would like to know what is the three most abundant substrings of length 6 from col2. The file is quite large and looks like this col1 col2 EN03 typehellobyedogcatcatdog EN09 typehellobyebyebyebye EN08 dogcatcatdogbyebyebyebye EN09 catcattypehellobyebyebyebye... (9 Replies)
Discussion started by: verse123
9 Replies

5. Shell Programming and Scripting

Finding most repeated entry in a column and giving the count

Please can you help in providing the most repeated entry in the 2nd column and give its count Here is an input file 1, This , is a forum 2, This , is a forum 1, There , is a forum 2, This , is not right Here the most repeated entry is "This" and count is 3 So output... (4 Replies)
Discussion started by: necro98
4 Replies

6. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (4 Replies)
Discussion started by: Vavad
4 Replies

7. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies

8. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies

9. Shell Programming and Scripting

Finding Authors in Common Across Dozens of Lists

I currently have publication lists for ~3 dozen faculty members. I need to find out how many publications are in common across all faculty members - person 1 with person 2, person 1 with person 3, person 2 with person 3, person 1 with both person 2 and person 3, etc. One person may have Last1,... (5 Replies)
Discussion started by: Peggy White
5 Replies

10. Shell Programming and Scripting

Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention: YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT What I would like to do is automatically discover the part of the filenames that are common to all... (1 Reply)
Discussion started by: cmcnorgan
1 Replies
Login or Register to Ask a Question