help to identify duplicate columns adjacent value


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers help to identify duplicate columns adjacent value
# 1  
Old 04-10-2011
help to identify duplicate columns adjacent value

Hi friends,

I have a xlsheet like below first column having id ABCfollowed by 7digit numbers and the next column have title against the ids. Titles are unique and duplicateboth, but ids are unique even for duplicate title.Now I need to identify those duplicate title having the highest id for each duplicate value.I have list of 5000 records.Need to identify duplicate records for this .
Code:
ABC1546793    Shaikh Yahya bin Ahmed Afifi
ABC1546787    Habib Noh
ABC1546691    Khoo Oon Teik
ABC1546548    Flor Contemplacion
ABC1538999    Ahmad bin Ibrahim
ABC2386809    Habib Noh
ABC2396515    Ahmad bin Ibrahim

Please help, need your kind help urgently

Thanks & Regards
Uma

Last edited by Franklin52; 04-12-2011 at 03:53 AM.. Reason: Please use code tags
# 2  
Old 04-11-2011
Code:
awk '{name="";for(j=2;j<=NF;j++)name=name" "$j;print name;idarr[name]=$1;titlearr[name]++;} END{for (i in titlearr){if(titlearr[i]>1)print idarr[i]" "i}}' inputfile

# 3  
Old 04-11-2011
Hi ,

Sorry this is not working, output is not in format it says, and some junk character is appearing.

Thanks
# 4  
Old 04-11-2011
Can you post some lines of the inputfile, the command you executed and the output you got?
# 5  
Old 04-11-2011
Hi ,

Thanks a lot for your kind reply.

Initially i gave my input and output as .xls, then i got the error, when i changed it to .txt i am getting the output like this which is actually what i want. Can you also help me to include the unique records , the title which are only once
Code:
 Ismail bin Haji Omar
 Hadijah Rahmat
 Ahmad Afandi Jamari
 Hafiza Talib
 Muhammad Rafi bin Abu Bakar
 Ahmad Awang
 Mohd Raman Daud
 Goh, Peter Augustine
 Ismail Wardi
 Norulashikin Jamain
 Alfian Saat
 Juffri bin Supaat
 Abdul Salam Ayob
 Ismail Sarkawi
 Mohd Taha Haji Jamil
 Ismail bin Haji Omar
 Mahmud bin Ahmad
 Mohammed Yusoff bin Abdul Rahman
 Jaafar bin Haji Muhammad
 Ahmad Afandi Jamari
 Mohd Gani Ahmad
 Sidek bin Saniff
 Masuri bin Salikun
 Ismail Wardi
 Mohamed Pitchay Gani bin Mohamed Abdul Aziz
 Ahmad Awang
 Mohd Khalid bin Mohd Lani
 Mohammed Saffri bin Abdul Manaf
 Mohamed Latiff bin Mohamed
 Aliman Hassan
ABC2495844 
ABC2386809  Habib Noh
ABC2499031  Beach Road
ABC492311  First plane to land
ABC2492214  Khong Guan Biscuit Company
ABC2493542  Public holidays (1979)

Thanks a lot

Last edited by Franklin52; 04-12-2011 at 03:53 AM.. Reason: Please use code tags
# 6  
Old 04-12-2011
Code:
sort -k1 inputfile | awk '{name="";for(j=2;j<=NF;j++)name=name" "$j;idarr[name]=$1;titlearr[name]++;} 
END{print "These are duplicate titles\n"; for (i in titlearr) {if(titlearr[i]>1)print idarr[i]" "i} print "These are unique titles\n"; for (i in titlearr){if(titlearr[i]==1)print idarr[i]" "i}}'


Last edited by tene; 04-15-2011 at 02:03 AM.. Reason: Typo error
# 7  
Old 04-12-2011
Hi,

Sorry i am getting syntax error, i am not that good in unix. Please advise
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Identify duplicate values at first column in csv file

Input 1,ABCD,no 2,system,yes 3,ABCD,yes 4,XYZ,no 5,XYZ,yes 6,pc,noCode used to find duplicate with regard to 2nd column awk 'NR == 1 {p=$2; next} p == $2 { print "Line" NR "$2 is duplicated"} {p=$2}' FS="," ./input.csv Now is there a wise way to de-duplicate the entire line (remove... (4 Replies)
Discussion started by: deadyetagain
4 Replies

2. Shell Programming and Scripting

Remove columns with duplicate entries

I have a 13gb file. It has the following columns: The 3rd column is basically correlation values. I want to delete those rows which are repeated between the columns: A B 0.04 B C 0.56 B B 1 A A 1 C D 1 C C 1 Desired Output: (preferably in a .csv format A,B,0.04 B,C,0.56 C,D,1... (3 Replies)
Discussion started by: Sanchari
3 Replies

3. Shell Programming and Scripting

Count duplicate lines ignoring certain columns

I have this structure: col1 col2 col3 col4 col5 27 xxx 38 aaa ttt 2 xxx 38 aaa yyy 1 xxx 38 aaa yyy I need to collapse duplicate lines ignoring column 1 and add values of duplicate lines (col1) so it will look like this: col1 col2 col3 col4 col5 27 xxx 38 aaa ttt ... (3 Replies)
Discussion started by: coppuca
3 Replies

4. Shell Programming and Scripting

Identify max value in diff columns for same row

Hi, I have a file with 1M records ABC 200 400 2.4 5.6 ABC 410 299 12 1.5 XYZ 4 5 6 7 MNO 22 40 30 70 MNO 47 55 80 150 What I want is for all the rows it should take the max value where there are duplicates output ABC 410 400 12 5.6 XYZ 4 5 6 7 MNO 47 55 80 150 How can i... (6 Replies)
Discussion started by: Diya123
6 Replies

5. Shell Programming and Scripting

Check to identify duplicate values at first column in csv file

Hello experts, I have a requirement where I have to implement two checks on a csv file: 1. Check to see if the value in first column is duplicate, if any value is duplicate script should exit. 2. Check to verify if the value at second column is between "yes" or "no", if it is anything else... (4 Replies)
Discussion started by: avikaljain
4 Replies

6. Shell Programming and Scripting

Remove Duplicate by considering multiple columns

hi friends, my input chr1 exon 35204 35266 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 42357 42473 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 45261 45404 gene_id "GOLGB1"; transcript_id "GOLGB1"; chr1 exon 50701 50778 gene_id "GOLGB1"; transcript_id "GOLGB1";... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

7. Shell Programming and Scripting

How to calculate the difference between two adjacent columns?

Dear All, I need to find the difference between two adjacent columns. The file is having 'i' columns and i need to find the difference between two adjacent columns (like $1 difference $2; $2 difference $3; .... and $(i-1) difference $i). I have used the following coding awk '{ for (i=1; i<NF;... (7 Replies)
Discussion started by: Fredrick
7 Replies

8. UNIX for Dummies Questions & Answers

Duplicate columns and lines

Hi all, I have a tab-delimited file and want to remove identical lines, i.e. all of line 1,2,4 because the columns are the same as the columns in other lines. Any input is appreciated. abc gi4597 9997 cgcgtgcg $%^&*()()* abc gi4597 9997 cgcgtgcg $%^&*()()* ttt ... (1 Reply)
Discussion started by: dr_sabz
1 Replies

9. Shell Programming and Scripting

how to identify duplicate columns in a row

Hi, How to identify duplicate columns in a row? Input data: may have 30 columns 9211480750 LK 120070417 920091030 9211480893 AZ 120070607 9205323621 O7 120090914 120090914 1420090914 2020090914 2020090914 9211479568 AZ 120070327 320090730 9211479571 MM 120070326 9211480892 MM 120070324... (3 Replies)
Discussion started by: suresh3566
3 Replies

10. UNIX for Dummies Questions & Answers

Identify duplicate words in a line using command

Hi, Let me explain the problem clearly: Let the entries in my file be: lion,tiger,bear apple,mango,orange,apple,grape unix,windows,solaris,windows,linux red,blue,green,yellow orange,maroon,pink,violet,orange,pink Can we detect the lines in which one of the words(separated by field... (8 Replies)
Discussion started by: srinivasan_85
8 Replies
Login or Register to Ask a Question