help to identify duplicate columns adjacent value

04-10-2011

Registered User

51, 0

Join Date: Dec 2009

Last Activity: 22 September 2011, 12:55 AM EDT

Posts: 51

Thanks Given: 0

Thanked 0 Times in 0 Posts

help to identify duplicate columns adjacent value

Hi friends,

I have a xlsheet like below first column having id ABCfollowed by 7digit numbers and the next column have title against the ids. Titles are unique and duplicateboth, but ids are unique even for duplicate title.Now I need to identify those duplicate title having the highest id for each duplicate value.I have list of 5000 records.Need to identify duplicate records for this .

Code:

ABC1546793    Shaikh Yahya bin Ahmed Afifi
ABC1546787    Habib Noh
ABC1546691    Khoo Oon Teik
ABC1546548    Flor Contemplacion
ABC1538999    Ahmad bin Ibrahim
ABC2386809    Habib Noh
ABC2396515    Ahmad bin Ibrahim

Please help, need your kind help urgently

Thanks & Regards
Uma

Last edited by Franklin52; 04-12-2011 at 03:53 AM.. Reason: Please use code tags

umapearl

View Public Profile for umapearl

Find all posts by umapearl

04-11-2011

Registered User

132, 18

Join Date: May 2008

Last Activity: 23 January 2013, 12:06 AM EST

Location: Chennai

Posts: 132

Thanks Given: 0

Thanked 18 Times in 18 Posts

Code:

awk '{name="";for(j=2;j<=NF;j++)name=name" "$j;print name;idarr[name]=$1;titlearr[name]++;} END{for (i in titlearr){if(titlearr[i]>1)print idarr[i]" "i}}' inputfile

tene

View Public Profile for tene

Find all posts by tene

04-11-2011

Registered User

51, 0

Join Date: Dec 2009

Last Activity: 22 September 2011, 12:55 AM EDT

Posts: 51

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi ,

Sorry this is not working, output is not in format it says, and some junk character is appearing.

Thanks

umapearl

View Public Profile for umapearl

Find all posts by umapearl

04-11-2011

Registered User

132, 18

Join Date: May 2008

Last Activity: 23 January 2013, 12:06 AM EST

Location: Chennai

Posts: 132

Thanks Given: 0

Thanked 18 Times in 18 Posts

Can you post some lines of the inputfile, the command you executed and the output you got?

tene

View Public Profile for tene

Find all posts by tene

04-11-2011

Registered User

51, 0

Join Date: Dec 2009

Last Activity: 22 September 2011, 12:55 AM EDT

Posts: 51

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi ,

Thanks a lot for your kind reply.

Initially i gave my input and output as .xls, then i got the error, when i changed it to .txt i am getting the output like this which is actually what i want. Can you also help me to include the unique records , the title which are only once

Code:

 Ismail bin Haji Omar
 Hadijah Rahmat
 Ahmad Afandi Jamari
 Hafiza Talib
 Muhammad Rafi bin Abu Bakar
 Ahmad Awang
 Mohd Raman Daud
 Goh, Peter Augustine
 Ismail Wardi
 Norulashikin Jamain
 Alfian Saat
 Juffri bin Supaat
 Abdul Salam Ayob
 Ismail Sarkawi
 Mohd Taha Haji Jamil
 Ismail bin Haji Omar
 Mahmud bin Ahmad
 Mohammed Yusoff bin Abdul Rahman
 Jaafar bin Haji Muhammad
 Ahmad Afandi Jamari
 Mohd Gani Ahmad
 Sidek bin Saniff
 Masuri bin Salikun
 Ismail Wardi
 Mohamed Pitchay Gani bin Mohamed Abdul Aziz
 Ahmad Awang
 Mohd Khalid bin Mohd Lani
 Mohammed Saffri bin Abdul Manaf
 Mohamed Latiff bin Mohamed
 Aliman Hassan
ABC2495844 
ABC2386809  Habib Noh
ABC2499031  Beach Road
ABC492311  First plane to land
ABC2492214  Khong Guan Biscuit Company
ABC2493542  Public holidays (1979)

Thanks a lot

Last edited by Franklin52; 04-12-2011 at 03:53 AM.. Reason: Please use code tags

umapearl

View Public Profile for umapearl

Find all posts by umapearl

04-12-2011

Registered User

132, 18

Join Date: May 2008

Last Activity: 23 January 2013, 12:06 AM EST

Location: Chennai

Posts: 132

Thanks Given: 0

Thanked 18 Times in 18 Posts

Code:

sort -k1 inputfile | awk '{name="";for(j=2;j<=NF;j++)name=name" "$j;idarr[name]=$1;titlearr[name]++;} 
END{print "These are duplicate titles\n"; for (i in titlearr) {if(titlearr[i]>1)print idarr[i]" "i} print "These are unique titles\n"; for (i in titlearr){if(titlearr[i]==1)print idarr[i]" "i}}'

Last edited by tene; 04-15-2011 at 02:03 AM.. Reason: Typo error

tene

View Public Profile for tene

Find all posts by tene

04-12-2011

Registered User

51, 0

Join Date: Dec 2009

Last Activity: 22 September 2011, 12:55 AM EDT

Posts: 51

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi,

Sorry i am getting syntax error, i am not that good in unix. Please advise

umapearl

View Public Profile for umapearl

Find all posts by umapearl

UNIX for Dummies Questions & Answers

help to identify duplicate columns adjacent value

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Identify duplicate values at first column in csv file

Discussion started by: deadyetagain

2. Shell Programming and Scripting

Remove columns with duplicate entries

Discussion started by: Sanchari

3. Shell Programming and Scripting

Count duplicate lines ignoring certain columns

Discussion started by: coppuca

4. Shell Programming and Scripting

Identify max value in diff columns for same row

Discussion started by: Diya123

5. Shell Programming and Scripting

Check to identify duplicate values at first column in csv file

Discussion started by: avikaljain

6. Shell Programming and Scripting

Remove Duplicate by considering multiple columns

Discussion started by: jacobs.smith

7. Shell Programming and Scripting

How to calculate the difference between two adjacent columns?

Discussion started by: Fredrick

8. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Discussion started by: dr_sabz

9. Shell Programming and Scripting

how to identify duplicate columns in a row

Discussion started by: suresh3566

10. UNIX for Dummies Questions & Answers

Identify duplicate words in a line using command

Discussion started by: srinivasan_85