Sponsored Content
Top Forums Shell Programming and Scripting Finding most repeated entry in a column and giving the count Post 302677305 by necro98 on Thursday 26th of July 2012 02:31:43 AM
Old 07-26-2012
Quote:
Originally Posted by summer_cherry
perl

Code:
open $fh,"<", "a";
while(<$fh>){
    chomp;
    my @tmp = split(",",$_);
    $hash{$tmp[1]}->{'CNT'}++;
    $hash{$tmp[1]}->{'CONTENT'}=$hash{$tmp[1]}->{'CONTENT'}."\n".$_;
}
close $fh;
my $key = (sort {$hash{$b}->{'CNT'} cmp $hash{$a}->{'CNT'}} keys %hash)[0];
print $key,"=",$hash{$key}->{'CNT'},"\n";
print $hash{$key}->{'CONTENT'};

awk:

Code:
awk -F"," '{
    cnt[$2]++
    content[$2]=sprintf("%s\n%s",content[$2],$0)
}
END{
    for(i in cnt){
        if(ind ==""){
            ind=i
            max=cnt[i]
        }
        else{
            if(cnt[i]>=max){
                ind=i
                max=cnt[i]
            }
        }
    }
    print ind"="cnt[ind]
    print content[ind]
}' a

Thanks very much f this ,it worked

In addition Some of the lines in the same file contain the letter C: with a value
Here the value is 0

1,00: This , is a good script c:0

I want to output of the lines with top 3 highest value for c:

1,00: This , is a nice script c:9999
1,00: This , is a good script c:9998
1,00: This , is a cool script c:9000
1,00: This , is a fun script c:12

So the output should be

1,00: This , is a nice script c:9999
1,00: This , is a good script c:9998
1,00: This , is a cool script c:9000

---------- Post updated at 01:31 AM ---------- Previous update was at 12:30 AM ----------

Hi summer , Please can you help with the above
 

10 More Discussions You Might Find Interesting

1. Solaris

Huge (repeated Entry) text files

Somebody HELP! I have a huge log file (TEXT) 76298035 bytes. It's a logfile of IMEIs and IMSIS that I get from my EIR node. Here is how the contents of the file look like: 000000, 1 33016382000913 652020100423994 1 33016382002353 652020100430743 1 33017035101003 652020100441736... (4 Replies)
Discussion started by: axl
4 Replies

2. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

3. Shell Programming and Scripting

Finding the most common entry in a column

Hi, I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this. e.g. value1,value2,bob value1,value2,bob... (12 Replies)
Discussion started by: Donkey25
12 Replies

4. Programming

Count the number of repeated characters in a given string

i have a string "dfasdfasdfadf" i want to count the number of times each character is repeated.. For instance, d is repeated 4 times, f is repeated 4 times.. can u give a program in c (1 Reply)
Discussion started by: pgmfourms
1 Replies

5. Shell Programming and Scripting

Help in counting the no of repeated words with count in a file

Hi Pls help in solving my doubt.Iam having file like below file1.txt priya jenny jenny priya raj radhika priya bharti bharti Output required: I need a output like count of repeated words with name for ex: priya 3 jenny 2 (4 Replies)
Discussion started by: bha148
4 Replies

6. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies

7. Shell Programming and Scripting

remove brackets and put it in a column and remove repeated entry

Hi all, I want to remove the remove bracket sign ( ) and put in the separate column I also want to remove the repeated entry like in first row in below input (PA156) is repeated ESR1 (PA156) leflunomide (PA450192) (PA156) leflunomide (PA450192) CHST3 (PA26503) docetaxel... (2 Replies)
Discussion started by: manigrover
2 Replies

8. Shell Programming and Scripting

Resume and count repeated values

Gents, Please can you help me. Input file 1050 , 9 ,9888 1050 ,10 ,9888 1050 ,11 ,9888 1050 ,13 ,9888 1050 ,15 ,9888 1051 , 9 ,9889 1051 ,12 ,9889 1051 ,15 ,9889 1051 ,18 ,9889 1052 , 9 ... (7 Replies)
Discussion started by: jiam912
7 Replies

9. UNIX for Beginners Questions & Answers

Export lines that have first entry repeated 5 times or above

Dears i want to extract lines only that have first entry repeated 3 times or above , ex data : -bash-3.00$ cat INTCONT-IS.CSV M205-00-106_AMDRN:1-0-6-22,12-662-4833,intContact,2016-11-15 02:32:16,50 M205-00-106_AMDRN:1-0-23-17,12-616-0462,intContact,2016-11-15 02:32:23,50... (5 Replies)
Discussion started by: is2_egypt
5 Replies

10. UNIX for Beginners Questions & Answers

Awk/sed summation of one column based on some entry in first column

Hi All , I am having an input file as stated below Input file 6 ddk/djhdj/djhdj/Q 10 0.5 dhd/jdjd.djd.nd/QB 01 0.5 hdhd/jd/jd/jdj/Q 10 0.5 512 hd/hdh/gdh/Q 01 0.5 jdjd/jd/ud/j/QB 10 0.5 HD/jsj/djd/Q 01 0.5 71 hdh/jjd/dj/jd/Q 10 0.5 ... (5 Replies)
Discussion started by: kshitij
5 Replies
vmstat(1)						      General Commands Manual							 vmstat(1)

NAME
vmstat - report virtual memory statistics SYNOPSIS
[interval [count]] | | DESCRIPTION
The command reports certain statistics kept about process, virtual memory, trap, and CPU activity. It also can clear the accumulators in the kernel structure. Options recognizes the following options: Report disk transfer information as a separate section, in the form of transfers per second. Provide an output format that is more easily viewed on an 80-column display device. This format separates the default output into two groups: vir- tual memory information and CPU data. Each group is displayed as a separate line of output. On multiprocessor systems, this display format also provides CPU utilization on a per CPU basis for the active processors. Report the number of processes swapped in and out and instead of page reclaims and address translation faults and interval Display successive lines which are summaries over the last interval seconds. The first line reported is for the time since a reboot and each subsequent line is for the last interval only. If interval is zero, the output is displayed once only. If the option is specified, the column headers are repeated. If is omitted, the column headers are not repeated. The command prints what the system is doing every five seconds. This is a good choice of printing interval since this is how often some of the statistics are sampled in the system; others vary every second. count Repeat the summary statistics count times. If count is omitted or zero, the output is repeated until an interrupt or quit signal is received. From the terminal, these are commonly and respectively (see stty(1)). Report on the number of forks and the number of pages of virtual memory involved since boot-up. Print the total number of several kinds of paging-related events from the kernel structure that have occurred since boot-up or since was last executed with the option. Clear all accumulators in the kernel structure. This option is restricted to the super user. If none of these options is given, displays a one-line summary of the virtual memory activity since boot-up or since the option was last executed. Column Descriptions The column headings and the meaning of each column are: Information about numbers of processes in various states. In run queue Blocked for resources (I/O, paging, etc.) Runnable or short sleeper (< 20 secs) but swapped Information about the usage of virtual and real memory. Virtual pages are considered active if they belong to processes that are running or have run in the last 20 seconds. Active virtual pages Size of the free list Information about page faults and paging activity. These are averaged each five seconds, and given in units per second. Page reclaims (without Address translation faults (without Processes swapped in (with Processes swapped out (with Pages paged in Pages paged out Pages freed per second Anticipated short term memory shortfall Pages scanned by clock algorithm, per second Trap/interrupt rate averages per second over last 5 seconds. Device interrupts per second (nonclock) System calls per second CPU context switch rate (switches/sec) Breakdown of percentage usage of CPU time for the active processors User time for normal and low priority processes System time CPU idle EXAMPLES
The following examples show the output for various command options. For formatting purposes, some leading blanks have been deleted. 1. Display the default output. 2. Add the disk tranfer information to the default output. 3. Display the default output in 80-column format. 4. Replace the page reclaims and address translation faults with process swapping in the default output. 5. Display the default output twice at five-second intervals. Note that the headers are repeated. 6. Display the default output twice in 80-column format at five-second intervals. Note that the headers are repeated. 7. Display the default output and disk transfers twice in 80-column format at five-second intervals. Note that the headers repeated. 8. Display the number of forks and pages of virtual memory since boot-up. 9. Display the counts of paging-related events. WARNINGS
Users of must not rely on the exact field widths and spacing of its output, as these will vary depending on the system, the release of HP- UX, and the data to be displayed. AUTHOR
was developed by the University of California, Berkeley and HP. SEE ALSO
iostat(1). vmstat(1)
All times are GMT -4. The time now is 09:56 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy