Visit Our UNIX and Linux User Community


How to delete or remove duplicate lines in a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to delete or remove duplicate lines in a file
# 1  
Old 07-20-2009
Question How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file.
I have a file having huge number of lines.
i want to remove selected lines in it.
And also if there exists duplicate lines, I want to delete the rest & just keep one of them.
Please help me with any unix commands or even fortran program
for example
Code:
 SIG   50   12   0   34   87   3.00  37.0000N  100.0000E
 SIG   50   12   0   34   87   3.00  37.0000N  100.0000E  
SIG   18     7   9     0     0    0.00  36.0000N   60.0000E
SSR   40    7    0    0     0    0.00  35.2000N   60.4000E

Here i want the output to look like
Code:
SIG   50   12   0   34   87   3.00  37.0000N  100.0000E
SIG   18     7   9     0     0    0.00  36.0000N   60.0000E
SSR  40    7    0    0     0    0.00  35.2000N   60.4000E


Last edited by Yogesh Sawant; 07-20-2009 at 07:36 AM.. Reason: added code tags
# 2  
Old 07-20-2009
If the order of the lines isn't important, sort -u. If the duplicates always appear grouped (as in your example), a simple uniq should suffice.

If there's no grouping and you want to keep the order:
Code:
$ perl -ne 'print if !$seen{$_}; $seen{$_}++' file

# 3  
Old 07-20-2009
Question

hi uniq is working but perl command is giving an error. I hav a dhought if i need to remove duplicates & keep just one of it in file then i can use uniq. but if i want to remove duplicate lines based on a criteria
for example
Code:
SIG    765   0   0   0   0   0.00  35.2000N   60.4000E   25      39 
SSR   765   0   0   0   0   0.00  34.5600N   65.4000E   25      67       89    
SSR 1390   5   0   0   0   0.00  39.8000N   64.4000E   20      56
LEE  1458   8   0   0   0   0.00  25.1000N   99.2000E    9                 56

now i want my output file to look lik
Code:
SSR   765   0   0   0   0   0.00  34.5600N   65.4000E   25      67       89    
SSR 1390   5   0   0   0   0.00  39.8000N   64.4000E   20      56
LEE  1458   8   0   0   0   0.00  25.1000N   99.2000E    9                 56

i mean to say only few specific colums should be checked if its same for example in the first file 2nd,3rd,4th,5th,6th,7th colums were same for 1st & 2nd row so i must remove the duplicate lines & retain a line which has maximum fields or colums in it.
Help me out if thier any command to check

Last edited by Yogesh Sawant; 07-20-2009 at 07:37 AM.. Reason: added code tags
# 4  
Old 07-20-2009
Code:
perl -lane'  
    $k = join " ", @F[ 1 .. 6 ];
    $m{$k} = @F if $u{$k}++ and @F > $m{$k};
    push @r, $_;

    END {
        for (@r) {
            $k = join " ", (split)[ 1 .. 6 ];
            print if $u{$k} == 1 or split == $m{$k};
        }
    }' infile

# 5  
Old 07-20-2009
Question

hii cant we use any other command like awk or any unix command for getting such output as shown in my privous post
# 6  
Old 07-20-2009
Of course, but why? What's the problem with Perl?
# 7  
Old 07-20-2009
I dont know to use perl and i am not understanding the code which you have given..so if u tell me any other simple command like awk or anything else it will be helpful.I am new to linux .
 

Previous Thread | Next Thread
Test Your Knowledge in Computers #561
Difficulty: Easy
GDB is a command line debugger.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines, sort it and save it as file itself

Hi, all I have a csv file that I would like to remove duplicate lines based on 1st field and sort them by the 1st field. If there are more than 1 line which is same on the 1st field, I want to keep the first line of them and remove the rest. I think I have to use uniq or something, but I still... (8 Replies)
Discussion started by: refrain
8 Replies

2. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

3. Shell Programming and Scripting

Remove duplicate lines from a 50 MB file size

hi, Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file. (6 Replies)
Discussion started by: vsachan
6 Replies

4. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

5. Shell Programming and Scripting

remove duplicate lines from file linux/sh

greetings, i'm hoping there is a way to cat a file, remove duplicate lines and send that output to a new file. the file will always vary but be something similar to this: please keep in mind that the above could be eight occurrences of each hostname or it might simply have another four of an... (2 Replies)
Discussion started by: crimso
2 Replies

6. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

7. UNIX for Dummies Questions & Answers

Delete duplicate lines and print to file

OK, I have read several things on how to do this, but can't make it work. I am writing this to a vi file then calling it as an awk script. So I need to search a file for duplicate lines, delete duplicate lines, then write the result to another file, say /home/accountant/files/docs/nodup ... (2 Replies)
Discussion started by: bfurlong
2 Replies

8. Shell Programming and Scripting

delete semi-duplicate lines from file?

Ok here's what I'm trying to do. I need to get a listing of all the mountpoints on a system into a file, which is easy enough, just using something like "mount | awk '{print $1}'" However, on a couple of systems, they have some mount points looking like this: /stage /stand /usr /MFPIS... (2 Replies)
Discussion started by: paqman
2 Replies

9. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

10. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies

Featured Tech Videos