Removing duplicates


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing duplicates
# 1  
Old 04-25-2011
Removing duplicates

I have a test file with the following 2 columns:

Code:
Col 1       |     Col 2
T1          |         1    <= remove
T5          |         1
  T4        |         2
    T1      |         3
    T3      |         3
T4          |         1    <= remove
  T1        |         2    <= remove
  T3        |         2    <= remove
T3          |         1    <= remove
T2          |         1


I need to remove any sub branches ... eg., T4 in the left column appears above with a value of 2 in the right column. So remove any other occurences of T4 with lesser value in the right column. Similarly T1, 1 T1,2 need to be removed because there is T1,3. Data with higher value in Column 2 needs to be retained.

Expected final list:

Code:
T5          |         1
  T4        |         2
    T1      |         3
    T3      |         3
T2          |         1


Last edited by Franklin52; 04-25-2011 at 02:20 PM.. Reason: Use code tags!
# 2  
Old 04-25-2011
Code:
awk -F"|" '$2 > a[$1]{a[$1]=$NF} END{for(i in a)print i FS a[i]}' file

# 3  
Old 04-25-2011
Thanks, it works, but it prints this way :

T1 | 3
T2 | 1
T3 | 3
T4 | 2
T5 | 1

Can we print it without altering the original sort order?

Also, the first column, with values greater than 1 in the second column, need to be indented. ie., T4, T1 & T3.

(Original file had the indendations, but for some reason the indendation gets removed when the code is posted).

---------- Post updated at 09:23 PM ---------- Previous update was at 12:18 PM ----------

Frankin, thanks for adding code tags to my post. So can we print it the way I want it?
# 4  
Old 04-26-2011
Try this,
Code:
awk -F"|" 'NR==FNR{if(a[$1]){ if(a[$1]<$2) {a[$1]=$2;b[$1]=NR}} else {a[$1]=$2;b[$1]=NR}}
NR>FNR{if(b[$1]==FNR){print}}' infile infile

# 5  
Old 04-26-2011
Try this one...

Code:
##--get unique tags
for i in ` cat testfile.txt | awk  '{print $1}'|sort -u`
do
grep $i testfile.txt >temp.txt
cat temp.txt | sort -n |tail -1  >>finaldata.txt
done

# 6  
Old 04-26-2011
Quote:
Originally Posted by pravin27
Try this,
Code:
awk -F"|" 'NR==FNR{if(a[$1]){ if(a[$1]<$2) {a[$1]=$2;b[$1]=NR}} else {a[$1]=$2;b[$1]=NR}}
NR>FNR{if(b[$1]==FNR){print}}' infile infile

Not sure, this is what I am getting:

Code:
 
!. srt1.sh
T1          |         1
T5          |         1
  T4        |         2
    T1      |         3
    T3      |         3
T4          |         1
  T1        |         2
  T3        |         2
T3          |         1
T2          |         1
T1          |         1
T5          |         1
  T4        |         2
    T1      |         3
    T3      |         3
T4          |         1
  T1        |         2
  T3        |         2
T3          |         1
T2          |         1
 
!cat srt1.sh
awk -F"|" 'NR==FNR{if(a[$1]){ if(a[$1]<$2) {a[$1]=$2;b[$1]=NR}} else {a[$1]=$2;b[$1]=NR}}
NR>FNR{if(b[$1]==FNR){print}}' fp1.txt fp1.txt

---------- Post updated at 09:59 AM ---------- Previous update was at 09:56 AM ----------

Quote:
Originally Posted by palanisvr
Code:
##--get unique tags
for i in ` cat testfile.txt | awk  '{print $1}'|sort -u`
do
grep $i testfile.txt >temp.txt
cat temp.txt | sort -n |tail -1  >>finaldata.txt
done

This is what I am getting:

Code:
!. srt.sh
T1          |         1
T2          |         1
T3          |         1
T4          |         1
T5          |         1
 
!cat srt.sh
for i in `cat fp1.txt | awk  '{print $1}'|sort -u`
do
grep $i fp1.txt >temp.txt
cat temp.txt | sort -n |tail -1  >>finaldata.txt
done
cat finaldata.txt

# 7  
Old 04-27-2011
I got the desired output for the below.

Code:
script : 
]$ cat test.sh
rm finaldata.txt
##--get unique tags
for i in ` cat tt.txt | awk  '{print $1}'|sort -u`
do
grep $i tt.txt >temp.txt
cat temp.txt | sort -n |tail -1  >>finaldata.txt
done
cat finaldata.txt


have tried with this test file :

Code:
$ cat tt.txt
T1          |         1
T5          |         1
  T4        |         2
    T1      |         3
    T3      |         3
T4          |         1
  T1        |         2
  T3        |         2
T3          |         1
T2          |         1
T1          |         1
T5          |         1
  T4        |         2
    T1      |         3
    T3      |         3
T4          |         1
  T1        |         2
  T3        |         2
T3          |         1
T2          |         1

Got output :

Code:
$sh test.sh
    T1      |         3
T2          |         1
    T3      |         3
  T4        |         2
T5          |         1

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

2. Shell Programming and Scripting

Removing duplicates except the last occurrence

Hi All, i have a file like below, @DB_FCTS\src\Data\Scripts\Delete_CU_OM_BIL_PRT_STMT_TYP.sql @DB_FCTS\src\Data\Scripts\Delete_CDP_BILL_LBL_MSG.sql @DB_FCTS\src\Data\Scripts\Delete_OM_BIDDR.sql @DB_FCTS\src\Data\Scripts\Insert_CU_OM_LBL_MSG.sql... (11 Replies)
Discussion started by: mechvijays
11 Replies

3. UNIX for Dummies Questions & Answers

Removing duplicates from a file

Hi All, I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file I,01,000131,764,2,4.00 I,01,000131,765,2,4.00 I,01,000131,772,2,4.00 I,01,000131,773,2,4.00 I,01,000168,762,2,2.00 I,01,000168,763,2,2.00... (5 Replies)
Discussion started by: Sri3001
5 Replies

4. Shell Programming and Scripting

Help in removing duplicates

I have an input file abc.txt with info like: abcd rateuse inklite robet rateuse abcd I need to remove duplicates from the file (eg: abcd,rateuse) from the file and need to place the contents in same file abc.txt if needed can be placed in another file. can anyone help me in this :( (4 Replies)
Discussion started by: rkrish
4 Replies

5. Emergency UNIX and Linux Support

Removing all the duplicates

i want to remove all the duplictaes in a file.I dont want even a single entry. For the input data: 12345|12|34 12345|13|23 3456|12|90 15670|12|13 12345|10|14 3456|12|13 i need the below data in one file 15670|12|13 and the below data in another file (9 Replies)
Discussion started by: pandeesh
9 Replies

6. UNIX for Advanced & Expert Users

removing duplicates.

Hi All In unix ,we have a file ,there we have to remove the duplicates by using one specific column. Can any body tell me the command. ex: file1 id,name 1,ww 2,qwq 2,asas 3,asa 4,asas 4,asas o/p: 1,ww 2,qwq 3,asa (7 Replies)
Discussion started by: raju4u
7 Replies

7. Shell Programming and Scripting

Removing duplicates

Hi, I have a file in the below format., test test (10) to to (25) see see (45) and i need the output in the format of test 10 to 25 see 45 Some one help me? (6 Replies)
Discussion started by: imdadulla
6 Replies

8. Shell Programming and Scripting

removing duplicates

Hi I have a file that are a list of people & their credentials i recieve frequently The issue is that whne I catnet this list that duplicat entries exists & are NOT CONSECUTIVE (i.e. uniq -1 may not weork here ) I'm trying to write a scrip that will remove duplicate entries the script can... (5 Replies)
Discussion started by: stevie_velvet
5 Replies

9. UNIX for Dummies Questions & Answers

removing duplicates and sort -k

Hello experts, I am trying to remove all lines in a csv file where the 2nd columns is a duplicate. I am try to use sort with the key parameter sort -u -k 2,2 File.csv > Output.csv File.csv File Name|Document Name|Document Title|Organization Word Doc 1.doc|Word Document|Sample... (3 Replies)
Discussion started by: orahi001
3 Replies

10. Shell Programming and Scripting

Removing duplicates

Hi, I've been trying to removed duplicates lines with similar columns in a fixed width file and it's not working. I've search the forum but nothing comes close. I have a sample file: 27147140631203RA CCD * 27147140631203RA PPN * 37147140631207RD AAA 47147140631203RD JNA... (12 Replies)
Discussion started by: giannicello
12 Replies
Login or Register to Ask a Question