Find duplicates in 2 & 3rd column and their ID

07-06-2017

Registered User

226, 10

Join Date: Jan 2010

Last Activity: 5 May 2020, 1:12 PM EDT

Posts: 226

Thanks Given: 5

Thanked 10 Times in 10 Posts

Find duplicates in 2 & 3rd column and their ID

with below given format,

I have been trying to find out all IDs for those entries with duplicate names in 2nd and 3rd columns and their count like how many time duplication happened for any name if any,

Code:

0.237788 Aaban Aahva
0.291066 Aabheer Aahlaad
0.845814 Aabid Aahan
0.152208 Aadam Aagneya
0.585537 Aadan Aagney
0.193475 Aadarsh Aagam
0.810623 Aadavan Aadvik
0.173531 Aadesh Aadrian
0.484983 Aadhan Aaditya
0.151863 Aadhira Aaditeya
0.366957 Aadi Aadit
0.491736 Aadidev Aadir
0.910094 Aadil Aadinath
0.265257 Aadinath Aadil
0.893188 Aadir Aadidev
0.220351 Aadit Aadi
0.631798 Aaditeya Aadhira
0.571077 Aaditya Aadhan
0.332158 Aadrian Aadesh
0.104455 Aadvik Aadavan
0.502931 Aagam Aadarsh
0.567394 Aagney Aadan
0.854165 Aagneya Aadam
0.0401409 Aahan Aabid
0.108022 Aahlaad Aabheer
0.639396 Aahva Aaban
0.291066  Aadil Aadinath
0.845814  Aadinath Aadil
0.152208  Aadir Aadidev
0.585537  Aadit Aadi
0.193475  Aaditeya Aadhira
0.810623  Aaditya Aadhan
0.173531  Aadrian Aadesh
0.484983  Aadvik Aadavan
0.151863  Aagam Aadarsh
0.366957  Aagney Aadan
0.491736  Aagneya Aadam
0.910094   Aahan Aabid
0.265257  Aahlaad Aabheer
0.893155  Aahva Aaban
0.193443  Aaditeya Aadhira
0.810667  Aaditya Aadhan
0.173545  Aadrian Aadesh
0.484934  Aadvik Aadavan
0.151862  Aagam Aadarsh
0.366954  Aagney Aadan
0.491736  Aagneya Aadam
0.910094   Aahan Aabid
0.265232  Aahlaad Aabheer
0.893134  Aahva Aaban

REgards,
Nasir

Moderator's Comments:

Use code tags please, thanks.

Last edited by zaxxon; 07-06-2017 at 07:05 AM..

busyboy

View Public Profile for busyboy

Find all posts by busyboy

07-06-2017

Registered User

6,575, 572

Join Date: Sep 2007

Last Activity: 5 November 2019, 9:08 AM EST

Location: St. Gallen, Switzerland

Posts: 6,575

Thanks Given: 179

Thanked 572 Times in 484 Posts

Please use code tags.

Can you show your tries?

zaxxon

View Public Profile for zaxxon

Find all posts by zaxxon

07-06-2017

Registered User

226, 10

Join Date: Jan 2010

Last Activity: 5 May 2020, 1:12 PM EDT

Posts: 226

Thanks Given: 5

Thanked 10 Times in 10 Posts

Thnks

I tried with below given code. but it turns out that only the ID of last match is returned...some logical error

Code:

 awk '{ a[$3$4]++; y[$3$4]=$1 }END{ for(x in y){ for (j in a) { if(a[j]>1) { if(x == j){ print j,y[x]} }   }}}' filter.data

busyboy

View Public Profile for busyboy

Find all posts by busyboy

07-06-2017

Registered User

6,575, 572

Join Date: Sep 2007

Last Activity: 5 November 2019, 9:08 AM EST

Location: St. Gallen, Switzerland

Posts: 6,575

Thanks Given: 179

Thanked 572 Times in 484 Posts

You are building a key with $3 and $4 where there exists only fields $1 $2 $3 with an unaltered field separator.

If I understood it correct, you might want something like this?

Code:

$ awk '{a[$2 FS $3]++; b[$2 FS $3]=$1} END{for(e in a){if(a[e] > 1){print a[e], b[e], e}}}' infile| sort -r
3 0.910094 Aahan Aabid
3 0.893134 Aahva Aaban
3 0.810667 Aaditya Aadhan
3 0.491736 Aagneya Aadam
3 0.484934 Aadvik Aadavan
3 0.366954 Aagney Aadan
3 0.265232 Aahlaad Aabheer
3 0.193443 Aaditeya Aadhira
3 0.173545 Aadrian Aadesh
3 0.151862 Aagam Aadarsh
2 0.845814 Aadinath Aadil
2 0.585537 Aadit Aadi
2 0.291066 Aadil Aadinath
2 0.152208 Aadir Aadidev

I am not sure if it makes sense to process $1 this way, since for example "Aahan Aabid" has 3 entries with 2 different numbers in front. If $1 makes a difference and needs to be printed, you might include $1 into the key instead of using it in a second array.

Last edited by zaxxon; 07-06-2017 at 10:54 AM..

zaxxon

View Public Profile for zaxxon

Find all posts by zaxxon

07-06-2017

Registered User

226, 10

Join Date: Jan 2010

Last Activity: 5 May 2020, 1:12 PM EDT

Posts: 226

Thanks Given: 5

Thanked 10 Times in 10 Posts

the question is to find all KEY ( $1) for a given name ( if $2 and $3 are more than once )

busyboy

View Public Profile for busyboy

Find all posts by busyboy

07-06-2017

Registered User

6,575, 572

Join Date: Sep 2007

Last Activity: 5 November 2019, 9:08 AM EST

Location: St. Gallen, Switzerland

Posts: 6,575

Thanks Given: 179

Thanked 572 Times in 484 Posts

You could try yourself, altering your code with what I explained already...

Code:

$ awk '{a[$1 FS $2 FS $3]++} END{for(e in a) if(a[e] > 1){print a[e], e}}' infile| sort -r
2 0.910094 Aahan Aabid
2 0.491736 Aagneya Aadam

Or maybe just this:

Code:

$ sort -rn infile | uniq -d
0.910094   Aahan Aabid
0.491736  Aagneya Aadam

zaxxon

View Public Profile for zaxxon

Find all posts by zaxxon

07-06-2017

Registered User

2,100, 402

Join Date: Apr 2009

Last Activity: 11 February 2020, 10:24 AM EST

Posts: 2,100

Thanks Given: 26

Thanked 402 Times in 360 Posts

Quote:

Originally Posted by busyboy

...
...
I have been trying to find out all IDs for those entries with duplicate names in 2nd and 3rd columns and their count like how many time duplication happened for any name if any,
...
...

If Perl is an option, then you could do something like this for a descriptive answer:

Code:

$
$ perl -lne 'm/^(\S+)\s+(.*)$/;
             $x{$2}->[0] += 1;
             push(@{$x{$2}->[1]}, $1) }{
             while (($k, $v) = each %x) {
                 printf("%-20s occurred %5d times with keys [%s]\n", $k, $v->[0], join(",",@{$v->[1]}))
                 if $v->[0] > 1
             }' data.txt
Aadrian Aadesh       occurred     3 times with keys [0.332158,0.173531,0.173545]
Aagneya Aadam        occurred     3 times with keys [0.854165,0.491736,0.491736]
Aahan Aabid          occurred     3 times with keys [0.0401409,0.910094,0.910094]
Aaditeya Aadhira     occurred     3 times with keys [0.631798,0.193475,0.193443]
Aahlaad Aabheer      occurred     3 times with keys [0.108022,0.265257,0.265232]
Aadir Aadidev        occurred     2 times with keys [0.893188,0.152208]
Aadinath Aadil       occurred     2 times with keys [0.265257,0.845814]
Aadit Aadi           occurred     2 times with keys [0.220351,0.585537]
Aaditya Aadhan       occurred     3 times with keys [0.571077,0.810623,0.810667]
Aagney Aadan         occurred     3 times with keys [0.567394,0.366957,0.366954]
Aadvik Aadavan       occurred     3 times with keys [0.104455,0.484983,0.484934]
Aahva Aaban          occurred     3 times with keys [0.639396,0.893155,0.893134]
Aadil Aadinath       occurred     2 times with keys [0.910094,0.291066]
Aagam Aadarsh        occurred     3 times with keys [0.502931,0.151863,0.151862]
$
$

From the output above, if you wanted to see only the key (1st column in your data file) and nothing else, then maybe something like this could work:

Code:

$
$ perl -lne 'm/^(\S+)\s+(.*)$/;
             $x{$2}->[0] += 1;
             push(@{$x{$2}->[1]}, $1) }{
             while (($k, $v) = each %x) {
                 print join("\n",@{$v->[1]}) if $v->[0] > 1
             }' data.txt
0.332158
0.173531
0.173545
0.854165
0.491736
0.491736
0.0401409
0.910094
0.910094
0.631798
0.193475
0.193443
0.108022
0.265257
0.265232
0.893188
0.152208
0.265257
0.845814
0.220351
0.585537
0.571077
0.810623
0.810667
0.567394
0.366957
0.366954
0.104455
0.484983
0.484934
0.639396
0.893155
0.893134
0.910094
0.291066
0.502931
0.151863
0.151862
$
$

If you post the output you want to see, then it should be helpful.

Last edited by durden_tyler; 07-06-2017 at 03:54 PM..

durden_tyler

View Public Profile for durden_tyler

Find all posts by durden_tyler

Shell Programming and Scripting

Find duplicates in 2 & 3rd column and their ID

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

UNIX script to compare 3rd column value with first column and display

Discussion started by: sekhar.lsb

2. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

3. Shell Programming and Scripting

Solution for replacement of 4th column with 3rd column in a file using awk/sed preserving delimters

Discussion started by: khblts

4. Shell Programming and Scripting

Changing values only in 3rd column and 4th column

Discussion started by: kenshinhimura

5. Shell Programming and Scripting

Find smallest & largest in every column

Discussion started by: attila

6. UNIX for Dummies Questions & Answers

Search word in 3rd column and move it to next column (4th)

Discussion started by: AK47

7. Shell Programming and Scripting

Find duplicates in column 1 and merge their lines (awk?)

Discussion started by: falcox

8. Shell Programming and Scripting

AWK script to create max value of 3rd column, grouping by first column

Discussion started by: ckmehta

9. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

10. Shell Programming and Scripting

Find duplicates in the first column of text file

Discussion started by: gameboy87