Delete duplicate rows


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Delete duplicate rows
# 1  
Old 04-03-2012
Delete duplicate rows

Hi,

This is a followup to my earlier post

Code:
him	mno	klm	20	76	.	+	.	klm_mango unix_00000001;
alp	fdc	klm	123   456	.	+	.	klm_mango unix_0000103;
her	tkr	klm	415	439	.	+	.	klm_mango unix_00001043;
abc	tvr	klm	20	76	.	+	.	klm_mango unix_00000001;
abc	def	klm	83	84	.	+	.	klm_mango unix_0000103;
abc	def	klm	83	84	.	+	.	klm_mango unix_1233333;
abc	def	klm	83	84	.	+	.	klm_mango unix_845454;
abc	def	klm	83	84	.	+	.	klm_mango unix_7875654;
abc	def	klm	83	84	.	+	.	klm_mango unix_8784552;

Now, I want to delete all the duplicate records by excluding the match on the last column. But, the first record of the duplicate rows should be considered and printed.

So, my output will be

Code:
him	mno	klm	20	76	.	+	.	klm_mango unix_00000001;
alp	fdc	klm	123   456	.	+	.	klm_mango unix_0000103;
her	tkr	klm	415	439	.	+	.	klm_mango unix_00001043;
abc	tvr	klm	20	76	.	+	.	klm_mango unix_00000001;
abc	def	klm	83	84	.	+	.	klm_mango unix_0000103;

# 2  
Old 04-03-2012
Try:
Code:
awk '!a[$10]++' file

# 3  
Old 04-03-2012
Hi jacobs.smith,

One way with perl:
Code:
$ cat infile
him     mno     klm     20      76      .       +       .       klm_mango unix_00000001;
alp     fdc     klm     123   456       .       +       .       klm_mango unix_0000103;
her     tkr     klm     415     439     .       +       .       klm_mango unix_00001043;
abc     tvr     klm     20      76      .       +       .       klm_mango unix_00000001;
abc     def     klm     83      84      .       +       .       klm_mango unix_0000103;
abc     def     klm     83      84      .       +       .       klm_mango unix_1233333;
abc     def     klm     83      84      .       +       .       klm_mango unix_845454;
abc     def     klm     83      84      .       +       .       klm_mango unix_7875654;                                                                                                                                                      
abc     def     klm     83      84      .       +       .       klm_mango unix_8784552;                                                                                                                                                      
$ cat myscript.pl 
use warnings;                                                                                                                                                                                                                                
use strict;                                                                                                                                                                                                                                  
                                                                                                                                                                                                                                             
my %duplicate;                                                                                                                                                                                                                               
                                                                                                                                                                                                                                             
while ( <> ) {                                                                                                                                                                                                                               
        chomp;                                                                                                                                                                                                                               
        my @f = split;                                                                                                                                                                                                                       
        if ( ++$duplicate{ join qq[], @f[ 0..($#f-1) ] } == 1 ) {                                                                                                                                                                            
                printf qq[%s\n], $_;                                                                                                                                                                                                         
        }                                                                                                                                                                                                                                    
}                                                                                                                                                                                                                                            
$ perl myscript.pl infile                                                                                                                                                                                                           
him     mno     klm     20      76      .       +       .       klm_mango unix_00000001;                                                                                                                                                     
alp     fdc     klm     123   456       .       +       .       klm_mango unix_0000103;                                                                                                                                                      
her     tkr     klm     415     439     .       +       .       klm_mango unix_00001043;                                                                                                                                                     
abc     tvr     klm     20      76      .       +       .       klm_mango unix_00000001;                                                                                                                                                     
abc     def     klm     83      84      .       +       .       klm_mango unix_0000103;

# 4  
Old 04-03-2012
Thanks bartus, but it is printing the whole file again. It is not removing the duplicates.
# 5  
Old 04-03-2012
Can you post output of:
Code:
cat -e file | head

# 6  
Old 04-24-2012
Code:
awk '{if (Previous_Line!=$1$2$3$4$5) print; Previous_Line=$1$2$3$4$5}' file

This is assuming the duplicated lines are already together like you have in your example. Otherwise, sort the file then pipe to the awk statement.

Moderator's Comments:
Mod Comment Link: How to use [code] tags

Last edited by Scrutinizer; 04-24-2012 at 04:05 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find duplicate values in specific column and delete all the duplicate values

Dear folks I have a map file of around 54K lines and some of the values in the second column have the same value and I want to find them and delete all of the same values. I looked over duplicate commands but my case is not to keep one of the duplicate values. I want to remove all of the same... (4 Replies)
Discussion started by: sajmar
4 Replies

2. Shell Programming and Scripting

Extract duplicate rows with conditions

Gents Can you help please. Input file 5490921425 1 7 1310342 54909214251 5490921425 2 1 1 54909214252 5491120937 1 1 3 54911209371 5491120937 3 1 1 54911209373 5491320785 1 ... (4 Replies)
Discussion started by: jiam912
4 Replies

3. Shell Programming and Scripting

Median and max of duplicate rows

Hi all, plz help me with this, I want to to extract the duplicate rows (column 1) in a file which at least repeat 4 times. then I want to summarize them by getting the max , mean, median and min. The file is sorted by column 1, all the repeated rows appear together. If number of elements is... (5 Replies)
Discussion started by: ritakadm
5 Replies

4. Shell Programming and Scripting

How to extract duplicate rows

Hi! I have a file as below: line1 line2 line2 line3 line3 line3 line4 line4 line4 line4 I would like to extract duplicate lines (not unique, triplicate or quadruplicate lines). Output will be as below: line2 line2 I would appreciate if anyone can help. Thanks. (4 Replies)
Discussion started by: chromatin
4 Replies

5. Ubuntu

delete duplicate rows with awk files

Hi every body I have some text file with a lots of duplicate rows like this: 165.179.568.197 154.893.836.174 242.473.396.153 165.179.568.197 165.179.568.197 165.179.568.197 154.893.836.174 how can I delete the repeated rows? Thanks Saeideh (2 Replies)
Discussion started by: sashtari
2 Replies

6. Shell Programming and Scripting

how to delete duplicate rows based on last column

hii i have a huge amt of data stored in a file.Here in this file i need to remove duplicates rows in such a way that the last column has different data & i must check for greatest among last colmn data & print the largest data along with other entries but just one of other duplicate entries is... (16 Replies)
Discussion started by: reva
16 Replies

7. HP-UX

How to get Duplicate rows in a file

Hi all, I have written one shell script. The output file of this script is having sql output. In that file, I want to extract the rows which are having multiple entries(duplicate rows). For example, the output file will be like the following way. ... (7 Replies)
Discussion started by: raghu.iv85
7 Replies

8. Shell Programming and Scripting

How to extract duplicate rows

I have searched the internet for duplicate row extracting. All I have seen is extracting good rows or eliminating duplicate rows. How do I extract duplicate rows from a flat file in unix. I'm using Korn shell on HP Unix. For.eg. FlatFile.txt ======== 123:456:678 123:456:678 123:456:876... (5 Replies)
Discussion started by: bobbygsk
5 Replies

9. Shell Programming and Scripting

how to delete duplicate rows in a file

I have a file content like below. "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","","" "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","","" "0000000","ABLNCYI","BOTH",1049,2058,"XYZ","5711002","","Y","","","","","","","",""... (5 Replies)
Discussion started by: vamshikrishnab
5 Replies

10. Shell Programming and Scripting

duplicate rows in a file

hi all can anyone please let me know if there is a way to find out duplicate rows in a file. i have a file that has hundreds of numbers(all in next row). i want to find out the numbers that are repeted in the file. eg. 123434 534 5575 4746767 347624 5575 i want 5575 please help (3 Replies)
Discussion started by: infyanurag
3 Replies
Login or Register to Ask a Question