Finding the most common entry in a column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding the most common entry in a column
# 8  
Old 11-22-2007
Hi,

I tried it on this test file:

1,2,bob
1,2,bob
1,2,bob
1,2,jay
1,2,tim

and it returned Tim.....

Regards
# 9  
Old 11-22-2007
Code:
#! /opt/third-party/bin/perl

open(FILE, "<", "a2");

while(<FILE>) {
  chomp;
  my @arr = split(/,/);
  $fileHash{$arr[2]}++;
}

close(FILE);

foreach my $k ( keys %fileHash ) {
  my $tmp = $fileHash{$k};
  if( $cnt < $tmp ) {
    $cnt = $tmp;
    $val = $k;
  }
}
print "$val : $cnt\n";

exit 0


Last edited by matrixmadhan; 11-22-2007 at 10:21 AM.. Reason: requirement - :)
# 10  
Old 11-22-2007
Hi.

With standard commands:
Code:
#!/usr/bin/env sh

# @(#) s1       Demonstrate determination of maximum string occurrence.

set -o nounset
echo

debug=":"
debug="echo"

## Use local command version for the commands in this demonstration.

echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version bash cut sort uniq sed

echo

FILE=${1-data1}

echo
echo " Input file:"
cat data1

echo
echo " Results from pipeline ( extract, sort, count, isolate ):"

cut -d, -f3 $FILE |
sort |
uniq -c |
sort -nr |
sed -n -e '1s/^ *[0-9][0-9]* *//p;q'

exit 0

Producing:
Code:
% ./s1

(Versions displayed with local utility "version")
GNU bash 2.05b.0
cut (coreutils) 5.2.1
sort (coreutils) 5.2.1
uniq (coreutils) 5.2.1
GNU sed version 4.1.2


 Input file:
value1,value2,bob
value1,value2,bob
value1,value2,bob
value1,value2,dave
value1,value2,james

 Results from pipeline ( extract, sort, count, isolate ):
bob

See man pages for details ... cheers, drl
# 11  
Old 11-22-2007
drl that's awesome!

I processed a file with 188,216 lines in about 3 seconds!

Thanks very much

regards
# 12  
Old 11-22-2007
Hi, Donkey25.

Yes, the standard utilities are generally quite fast; glad it worked out ... cheers, drl
# 13  
Old 11-22-2007
Quote:
Originally Posted by Donkey25
Hi,

I tried it on this test file:

1,2,bob
1,2,bob
1,2,bob
1,2,jay
1,2,tim

and it returned Tim.....

Regards
Opps , small mistake:

Code:
> cat lis
1,2,bob
1,2,bob
1,2,bob
1,2,jay
1,2,tim

Code:
>awk -F\, '{print $NF}' lis|sort -u|xargs -i ksh -c 'echo "{} \c";grep -wc ".*,{}$" lis'|sort -r -k 2,2|head -1|awk '{print $1}'
bob

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk/sed summation of one column based on some entry in first column

Hi All , I am having an input file as stated below Input file 6 ddk/djhdj/djhdj/Q 10 0.5 dhd/jdjd.djd.nd/QB 01 0.5 hdhd/jd/jd/jdj/Q 10 0.5 512 hd/hdh/gdh/Q 01 0.5 jdjd/jd/ud/j/QB 10 0.5 HD/jsj/djd/Q 01 0.5 71 hdh/jjd/dj/jd/Q 10 0.5 ... (5 Replies)
Discussion started by: kshitij
5 Replies

2. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Hello, I need to find the intersection across 10 columns. Kindly help. my file (INPUT.csv) looks like this 4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies

3. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

4. Shell Programming and Scripting

Finding most common substrings

Hello, I would like to know what is the three most abundant substrings of length 6 from col2. The file is quite large and looks like this col1 col2 EN03 typehellobyedogcatcatdog EN09 typehellobyebyebyebye EN08 dogcatcatdogbyebyebyebye EN09 catcattypehellobyebyebyebye... (9 Replies)
Discussion started by: verse123
9 Replies

5. Shell Programming and Scripting

Finding most repeated entry in a column and giving the count

Please can you help in providing the most repeated entry in the 2nd column and give its count Here is an input file 1, This , is a forum 2, This , is a forum 1, There , is a forum 2, This , is not right Here the most repeated entry is "This" and count is 3 So output... (4 Replies)
Discussion started by: necro98
4 Replies

6. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (4 Replies)
Discussion started by: Vavad
4 Replies

7. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies

8. Shell Programming and Scripting

finding common numbers (contents) across 2 or 3 files

I have 3 files which are tab delimited and have numbers in it. file 1 1 2 3 4 5 6 7 File 2 3 5 7 8 File 3 1 (4 Replies)
Discussion started by: Lucky Ali
4 Replies

9. Shell Programming and Scripting

Finding Authors in Common Across Dozens of Lists

I currently have publication lists for ~3 dozen faculty members. I need to find out how many publications are in common across all faculty members - person 1 with person 2, person 1 with person 3, person 2 with person 3, person 1 with both person 2 and person 3, etc. One person may have Last1,... (5 Replies)
Discussion started by: Peggy White
5 Replies

10. Shell Programming and Scripting

Finding longest common substring among filenames

I will be performing a task on several directories, each containing a large number of files (2500+) that follow a regular naming convention: YYYY_MM_DD_XX.foo_bar.A.B.some_different_stuff.EXT What I would like to do is automatically discover the part of the filenames that are common to all... (1 Reply)
Discussion started by: cmcnorgan
1 Replies
Login or Register to Ask a Question