Getting the most common column with respect another


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Getting the most common column with respect another
# 1  
Getting the most common column with respect another

hi all,

i want to get the most comon column w.r.t another

this is my file
Code:
Tom|london
Tom|london
Tom|Paris
Adam|Madrid
Adam|NY

the Output to get me :
Code:
Tom|london
Adamn|Madrid

ive tried
Code:
sort  -u -t"|" -k1,1  but it get  only uniq column with the first appearance not most repeated ***

# 2  
Code:
perl -ne '($name,$location)=split/\|/,$_;$locations{$name}{$location}++; END{for $name (keys %locations){$max=0;for $location (keys %{$locations{$name}}){if ($locations{$name}{$location}>$max){$most_frequent=$location;$max= $locations{$name}{$location};}} print "$name|$most_frequent"}}' file


Last edited by Skrynesaver; 08-20-2013 at 12:26 PM.. Reason: ooops: Forgot to update current max in inner loop.
This User Gave Thanks to Skrynesaver For This Post:
# 3  
thanks alot its so fast ive tired it on a 15Million record file i got it in 3 min nearly but some numbers were wrong , can it be a sort problem , when i tried it on unsorted 100 number it got right answer but for the whole file it got mistakes ????

i Think it got me the first appearance only !!

Last edited by teefa; 08-20-2013 at 09:00 AM..
# 4  
Code:
awk '{++a[$1]} END{for(i in a){print i"|"a[i]}}' inputfile | awk -F"|" '{if($3>b[$1]){b[$1]=$3;c[$1]=$2}} END{for(i in b){print i"|"c[i]}}'


If you have two mappings for a given first field in input, say
Code:
Adam|Madrid
Adam|NY

then the output will be just Adam | Madrid . Not sure if this is what you wanted.
This User Gave Thanks to krishmaths For This Post:
# 5  
SmilieSmilieSmilieSmilieSmilieSmilieSmilie

Last edited by rdcwayx; 08-21-2013 at 05:10 AM..
# 6  
Quote:
Originally Posted by rdcwayx
Code:
awk '!a[$1]++' FS=\|  infile

This may not work if the input has
Code:
Tom|london
Tom|london
Tom|Paris
Adam|Madrid
Adam|NY
Tom|amsterdam
Tom|amsterdam
Tom|amsterdam
Tom|amsterdam

# 7  
krishmaths Image
Registered User
krishmaths is active

Save


@Krish thanks hope it can be fast as perl was so fast , i deal with huge files Cant u Make me similar funcationality with perl or adjust the upper as it needs be so fast Smilie
@rdc u must make sort | uniq -c | sort -nr , and i takes alot of time while writing
and thanks alot
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #3
Difficulty: Easy
'Hello World' is a program only for advanced programmers.
True or False?

10 More Discussions You Might Find Interesting

1. Programming

Find the minimum value of the column with respect to other column

Hi All, I would like get the minimum value in the certain column with respect to other column. For example, I have a text file like this. ATOM 1 QSS SPH S 0 -2.790 -1.180 -2.282 2.28 2.28 ATOM 1 QSS SPH S 1 -2.915 -1.024 -2.032 2.31 2.31 ATOM 1 ... (4 Replies)
Discussion started by: bala06
4 Replies

2. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

3. Shell Programming and Scripting

Count common elements in a column

HI, I have a 3-column tab separated column (approx 1GB) in which I would like to count and output the frequency of all of the common elements in the 1st column. For instance: If my input was the following: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 My... (4 Replies)
Discussion started by: owwow14
4 Replies

4. Shell Programming and Scripting

Merge with common column

hi i have two files and i wanted to join them using common column. try to do this using "join" command but that did not help. File 1: 123 9a.vcf hy92.vcf hy90.vcf Index Ref Alt Ref Alt Ref Alt 315 14 0 7 4 ... (6 Replies)
Discussion started by: empyrean
6 Replies

5. Shell Programming and Scripting

Count and merge using common column

I have the following records from multiple files. 415 A G 415 A G 415 A T 415 A . 415 A . 421 G A 421 G A,C 421 G A 421 G A 421 G A,C 421 G . 427 A C 427 A ... (3 Replies)
Discussion started by: empyrean
3 Replies

6. Shell Programming and Scripting

convert row to column with respect of first column.

Input file A.txt :- C2062 -117.6 -118.5 -117.5 C5145 0 0 0 C5696 0 0 0 Output file B.txt C2062 X -117.6 C2062 Y -118.5 C2062 Z -117.5... (4 Replies)
Discussion started by: asavaliya
4 Replies

7. UNIX for Dummies Questions & Answers

Merge rows with common column

Dear all I have big file with two columns A_AA960715 GO:0006952 A_AA960715 GO:0008152 A_AA960715 GO:0016491 A_AA960715 GO:0007165 A_AA960715 GO:0005618 A_AA960716 GO:0006952 A_AA960716 GO:0005618 A_AA960716... (15 Replies)
Discussion started by: AAWT
15 Replies

8. Shell Programming and Scripting

convert columns into rows with respect to first column

Hello All, Please help me with this file. My input file (Tab separated) is like: Abc-01 pc1 -0.69 Abc-01 E2cR 0.459666666666667 Abc-01 5ez.2 1.2265625 Xyz-01 pc1 -0.153 Xyz-01 E2cR 1.7358 Xyz-01 5ez.2 2.0254 Ced-02 pc1 -0.5714 Ced-02 ... (7 Replies)
Discussion started by: mira
7 Replies

9. Shell Programming and Scripting

Merging two files with a common column

Hi, I have two files file1 and file2. I have to merge the columns of those two files into file3 based on common column of two files. To be simple. file1: Row-id name1 13456 Rahul 16789 Vishal 18901 Karan file2 : Row-id place 18901 Mumbai ... (2 Replies)
Discussion started by: manneni prakash
2 Replies

10. Shell Programming and Scripting

Finding the most common entry in a column

Hi, I have a file with 3 columns in it that are comma separated and it has about 5000 lines. What I want to do is find the most common value in column 3 using awk or a shell script or whatever works! I'm totally stuck on how to do this. e.g. value1,value2,bob value1,value2,bob... (12 Replies)
Discussion started by: Donkey25
12 Replies

Featured Tech Videos