Getting the most repeated column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Getting the most repeated column
# 1  
Old 07-10-2013
Getting the most repeated column

Hi all ,

i want to get the most repeated column in my file
File:
Code:
name,ID 
adam,12345  ----1
adam,12345  ----2
adam,934
adam,12345  ----3
john,14
john,13
john,25 ----1 
john,25 ----2
tom,1  -----1
tom,2  -----1

so my output to be
Code:
adam,12345,4    ----[4] mean adams appears 4 times
john,25,4
tom,1,2 ----as it appears first  or if possible get tom,1,2,2 --- (1) and (2) with 2 appearances

thanks alot in advance
# 2  
Old 07-10-2013
what have you tried?
# 3  
Old 07-10-2013
i have tried with
Code:
cat file | uniq -c | /usr/xpg4/bin/awk -F"," '!a[$1,$2]++'

to get me first or most repeated but in case of tom i want the 2 cases from 2 diff records in one record

its not that simple
Code:
name,ID,ID1,ID2
adam,12345,1,2  ----1
adam,12345,1,1  ----2
adam,934,1,2
adam,12345,2,2  ----3
john,14
john,13
john,25 ----1 
john,25 ----2
tom,1  -----1
tom,2  -----1

to get me for example
Code:
adam,12345,1,2 the most repeated fields in one reocrds

# 4  
Old 07-10-2013
I am not sure if I understood what you want, since the 2nd post for line "adam,12345" differs from the example in post 1. There are 2 new fields that weren't there in post 1.
Seems there is some inconsistency between examples of input and output.

Anyway, giving a blind shot taking the 1st example as input without the -----[n]:
Code:
$ awk 'NR > 1{_[$1]++} END{for(a in _){print a ","  _[a]}}' infile | sort -nt, -k3| tail -1
adam,12345,3

The cat in your code is not needed.
This User Gave Thanks to zaxxon For This Post:
# 5  
Old 07-10-2013
i have tried it , it works but i think you can make it
Code:
NR>=1  instead NR>1

but thanks but can u look at the second example
# 6  
Old 07-10-2013
Nope, that's not correct.
NR>=1 means equal or greater than 1. Since a file you want to parse usually has a 1st line, this makes no sense. You could leave it away if you want to count the header in.
If you want to skip the header line, you have to use NR>1 to just skip it.

So if the 2nd example is just a new or altered request, you should be able to alter the code given, to achieve the same. If you do not understand the code, that is no problem, but you have to let us know.
This is no script drive-in Smilie

Last edited by zaxxon; 07-10-2013 at 08:02 AM.. Reason: spelling
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Append no of times a column is repeated at the end

Hi folks, Iam working on a bash script, i need to print how many times column 2 repeated at the end of each line. Input.txt COL1 COL2 COL3 COL4 1 XX 45 N 2 YY 34 y 3 ZZ 44 N 4 XX 89 Y 5 XX 45 N 6 YY 84 D 7 ZZ 22 S Output.txt COL1 COL2 COL3 COL4 COL5 1 XX 45 N 3 2 YY 34... (6 Replies)
Discussion started by: tech_frk
6 Replies

2. Shell Programming and Scripting

Choosing between repeated entries based on the "absolute values" of a column

Hello, I was looking for a way to select between the repeated entries (column1) based on the values of absolute values of column 3 (larger value). For example if the same gene id has FC value -2 and 1, I should get the output as -2. Kindly help. GeneID Description FC ... (2 Replies)
Discussion started by: Sanchari
2 Replies

3. Shell Programming and Scripting

Choosing between repeated entries based on a column field

Hello, I have an input file: LOC_Os04g01890\LOC_Os05g17604 0.051307 LOC_Os04g01890\LOC_Os05g17604 0.150977 LOC_Os04g01890\LOC_Os05g17604 0.306231 LOC_Os04g01890\LOC_Os06g33100 0.168037 LOC_Os04g01890\LOC_Os06g33100 0.236293 ... (3 Replies)
Discussion started by: Sanchari
3 Replies

4. Shell Programming and Scripting

Find repeated word and take sum of the second field to it ,for all the repeated words in awk

Hi below is the input file, i need to find repeated words and sum up the values of it which is second field from the repeated work.Im trying but getting no where close to it.Kindly give me a hint on how to go about it Input fruits,apple,20,fruits,mango,20,veg,carrot,12,veg,raddish,30... (11 Replies)
Discussion started by: 100bees
11 Replies

5. Shell Programming and Scripting

Finding most repeated entry in a column and giving the count

Please can you help in providing the most repeated entry in the 2nd column and give its count Here is an input file 1, This , is a forum 2, This , is a forum 1, There , is a forum 2, This , is not right Here the most repeated entry is "This" and count is 3 So output... (4 Replies)
Discussion started by: necro98
4 Replies

6. Emergency UNIX and Linux Support

[Solved] Extract records based on a repeated column value

Hi guys, I need help in making a command to find some data. I have multiple files in which multiple records are present.. Each record is separated with a carriage return and in each record there are multiple fields with each field separated by "|" what i want is that I want to extract... (1 Reply)
Discussion started by: m_usmanayub
1 Replies

7. UNIX for Dummies Questions & Answers

Average for repeated elements in a column

I have a file that looks like this 452 025_E3 8 025_E3 82 025_F5 135 025_F5 5 025_F5 23 025_G2 38 025_G2 71 025_G2 9 026_A12 81 026_A12 10 026_A12 some of the elements in column2 are repeated. I want an output file that will extract the... (1 Reply)
Discussion started by: FelipeAd
1 Replies

8. UNIX for Dummies Questions & Answers

Extracting column if above certain values and repeated over a number of times continuously

Hi I am new to the forum and would like to ask: i have a file in form with thousands of column id.1 A01 A01 A68 A68 id.2 A5 A5 A3 A3 1001 0 0 0.136 0.136 1002 0 0 0.262 0.183 1003 0 0 0.662 0.662 1004 0 0 ... (9 Replies)
Discussion started by: newbeeuk
9 Replies

9. Shell Programming and Scripting

repeated column data filter and make as a row

I need to get the output in row wise for the repeated column data Ex: Input: que = five ans = 5 que = six ans = 6 Required output: que = five six ans = 5 6 Any body can guide me?"""""" (2 Replies)
Discussion started by: vasanth_vadalur
2 Replies

10. Shell Programming and Scripting

Deleting repeated strings in column 2

Hi to all, I have a file where the subject could contain "Summarized Availability Report" or only "Summarized Report" If the subject is "Summarized Availability Report" I want to apply it Scrip1 and if the subject is "Summarized Report" I want to apply it Scrip2. 1-) I would like you... (5 Replies)
Discussion started by: cgkmal
5 Replies
Login or Register to Ask a Question