Finding most repeated entry in a column and giving the count


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Finding most repeated entry in a column and giving the count
# 1  
Old 07-25-2012
Finding most repeated entry in a column and giving the count

Please can you help in providing the most repeated entry in the 2nd column and give its count

Here is an input file

Code:
 
1, This , is a forum
2, This ,  is a forum
1, There , is a forum
2, This ,  is not right


Here the most repeated entry is "This" and count is 3

So output shout contain all lines with the word and look like this

Code:
 
 
This = 3 
 
1, This , is a forum
2, This ,  is a forum
2, This ,  is not right

# 2  
Old 07-25-2012
Did you try googling or searching through this forum?
# 3  
Old 07-26-2012
Given the unknown input file size, this is one case where I think awk followed by grep is approrprate:

Code:
#!/usr/bin/env ksh
what=$( awk ' { c[$2]++; if( c[$2] > max ) max = $2; } END { printf( "%s = %d\n", max, c[max] ); }'  input-file)
printf "%s\n\n" "$what"
grep ${what%% *}  input-file

Should also work in bash if you prefer
# 4  
Old 07-26-2012
perl OR awk

perl

Code:
open $fh,"<", "a";
while(<$fh>){
    chomp;
    my @tmp = split(",",$_);
    $hash{$tmp[1]}->{'CNT'}++;
    $hash{$tmp[1]}->{'CONTENT'}=$hash{$tmp[1]}->{'CONTENT'}."\n".$_;
}
close $fh;
my $key = (sort {$hash{$b}->{'CNT'} cmp $hash{$a}->{'CNT'}} keys %hash)[0];
print $key,"=",$hash{$key}->{'CNT'},"\n";
print $hash{$key}->{'CONTENT'};


awk:

Code:
awk -F"," '{
    cnt[$2]++
    content[$2]=sprintf("%s\n%s",content[$2],$0)
}
END{
    for(i in cnt){
        if(ind ==""){
            ind=i
            max=cnt[i]
        }
        else{
            if(cnt[i]>=max){
                ind=i
                max=cnt[i]
            }
        }
    }
    print ind"="cnt[ind]
    print content[ind]
}' a

# 5  
Old 07-26-2012
Quote:
Originally Posted by summer_cherry
perl

Code:
open $fh,"<", "a";
while(<$fh>){
    chomp;
    my @tmp = split(",",$_);
    $hash{$tmp[1]}->{'CNT'}++;
    $hash{$tmp[1]}->{'CONTENT'}=$hash{$tmp[1]}->{'CONTENT'}."\n".$_;
}
close $fh;
my $key = (sort {$hash{$b}->{'CNT'} cmp $hash{$a}->{'CNT'}} keys %hash)[0];
print $key,"=",$hash{$key}->{'CNT'},"\n";
print $hash{$key}->{'CONTENT'};

awk:

Code:
awk -F"," '{
    cnt[$2]++
    content[$2]=sprintf("%s\n%s",content[$2],$0)
}
END{
    for(i in cnt){
        if(ind ==""){
            ind=i
            max=cnt[i]
        }
        else{
            if(cnt[i]>=max){
                ind=i
                max=cnt[i]
            }
        }
    }
    print ind"="cnt[ind]
    print content[ind]
}' a

Thanks very much f this ,it worked

In addition Some of the lines in the same file contain the letter C: with a value
Here the value is 0

1,00: This , is a good script c:0

I want to output of the lines with top 3 highest value for c:

1,00: This , is a nice script c:9999
1,00: This , is a good script c:9998
1,00: This , is a cool script c:9000
1,00: This , is a fun script c:12

So the output should be

1,00: This , is a nice script c:9999
1,00: This , is a good script c:9998
1,00: This , is a cool script c:9000

---------- Post updated at 01:31 AM ---------- Previous update was at 12:30 AM ----------

Hi summer , Please can you help with the above
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk/sed summation of one column based on some entry in first column

Hi All , I am having an input file as stated below Input file 6 ddk/djhdj/djhdj/Q 10 0.5 dhd/jdjd.djd.nd/QB 01 0.5 hdhd/jd/jd/jdj/Q 10 0.5 512 hd/hdh/gdh/Q 01 0.5 jdjd/jd/ud/j/QB 10 0.5 HD/jsj/djd/Q 01 0.5 71 hdh/jjd/dj/jd/Q 10 0.5 ... (5 Replies)
Discussion started by: kshitij
5 Replies

2. UNIX for Beginners Questions & Answers

Export lines that have first entry repeated 5 times or above

Dears i want to extract lines only that have first entry repeated 3 times or above , ex data : -bash-3.00$ cat INTCONT-IS.CSV M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50 M205-00-106_AMDRN:1-0-23-17,12-616-0462,intContact,2016-11-15 02:32:23,50... (5 Replies)
Discussion started by: is2_egypt
5 Replies

3. Shell Programming and Scripting

Resume and count repeated values

Gents, Please can you help me. Input file 1050 , 9 ,9888 1050 ,10 ,9888 1050 ,11 ,9888 1050 ,13 ,9888 1050 ,15 ,9888 1051 , 9 ,9889 1051 ,12 ,9889 1051 ,15 ,9889 1051 ,18 ,9889 1052 , 9 ... (7 Replies)
Discussion started by: jiam912
7 Replies

4. Shell Programming and Scripting

remove brackets and put it in a column and remove repeated entry

Hi all, I want to remove the remove bracket sign ( ) and put in the separate column I also want to remove the repeated entry like in first row in below input (PA156) is repeated ESR1 (PA156) leflunomide (PA450192) (PA156) leflunomide (PA450192) CHST3 (PA26503) docetaxel... (2 Replies)
Discussion started by: manigrover
2 Replies

5. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies

6. Shell Programming and Scripting

Help in counting the no of repeated words with count in a file

Hi Pls help in solving my doubt.Iam having file like below file1.txt priya jenny jenny priya raj radhika priya bharti bharti Output required: I need a output like count of repeated words with name for ex: priya 3 jenny 2 (4 Replies)
Discussion started by: bha148
4 Replies

7. Programming

Count the number of repeated characters in a given string

i have a string "dfasdfasdfadf" i want to count the number of times each character is repeated.. For instance, d is repeated 4 times, f is repeated 4 times.. can u give a program in c (1 Reply)
Discussion started by: pgmfourms
1 Replies

8. Shell Programming and Scripting

Finding the most common entry in a column

Hi, I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this. e.g. value1,value2,bob value1,value2,bob... (12 Replies)
Discussion started by: Donkey25
12 Replies

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

10. Solaris

Huge (repeated Entry) text files

Somebody HELP! I have a huge log file (TEXT) 76298035 bytes. It's a logfile of IMEIs and IMSIS that I get from my EIR node. Here is how the contents of the file look like: 000000, 1 33016382000913 652020100423994 1 33016382002353 652020100430743 1 33017035101003 652020100441736... (4 Replies)
Discussion started by: axl
4 Replies
Login or Register to Ask a Question