How to delete or remove duplicate lines in a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to delete or remove duplicate lines in a file
# 1  
Old 07-20-2009
Question How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file.
I have a file having huge number of lines.
i want to remove selected lines in it.
And also if there exists duplicate lines, I want to delete the rest & just keep one of them.
Please help me with any unix commands or even fortran program
for example
Code:
 SIG   50   12   0   34   87   3.00  37.0000N  100.0000E
 SIG   50   12   0   34   87   3.00  37.0000N  100.0000E  
SIG   18     7   9     0     0    0.00  36.0000N   60.0000E
SSR   40    7    0    0     0    0.00  35.2000N   60.4000E

Here i want the output to look like
Code:
SIG   50   12   0   34   87   3.00  37.0000N  100.0000E
SIG   18     7   9     0     0    0.00  36.0000N   60.0000E
SSR  40    7    0    0     0    0.00  35.2000N   60.4000E


Last edited by Yogesh Sawant; 07-20-2009 at 07:36 AM.. Reason: added code tags
# 2  
Old 07-20-2009
If the order of the lines isn't important, sort -u. If the duplicates always appear grouped (as in your example), a simple uniq should suffice.

If there's no grouping and you want to keep the order:
Code:
$ perl -ne 'print if !$seen{$_}; $seen{$_}++' file

# 3  
Old 07-20-2009
Question

hi uniq is working but perl command is giving an error. I hav a dhought if i need to remove duplicates & keep just one of it in file then i can use uniq. but if i want to remove duplicate lines based on a criteria
for example
Code:
SIG    765   0   0   0   0   0.00  35.2000N   60.4000E   25      39 
SSR   765   0   0   0   0   0.00  34.5600N   65.4000E   25      67       89    
SSR 1390   5   0   0   0   0.00  39.8000N   64.4000E   20      56
LEE  1458   8   0   0   0   0.00  25.1000N   99.2000E    9                 56

now i want my output file to look lik
Code:
SSR   765   0   0   0   0   0.00  34.5600N   65.4000E   25      67       89    
SSR 1390   5   0   0   0   0.00  39.8000N   64.4000E   20      56
LEE  1458   8   0   0   0   0.00  25.1000N   99.2000E    9                 56

i mean to say only few specific colums should be checked if its same for example in the first file 2nd,3rd,4th,5th,6th,7th colums were same for 1st & 2nd row so i must remove the duplicate lines & retain a line which has maximum fields or colums in it.
Help me out if thier any command to check

Last edited by Yogesh Sawant; 07-20-2009 at 07:37 AM.. Reason: added code tags
# 4  
Old 07-20-2009
Code:
perl -lane'  
    $k = join " ", @F[ 1 .. 6 ];
    $m{$k} = @F if $u{$k}++ and @F > $m{$k};
    push @r, $_;

    END {
        for (@r) {
            $k = join " ", (split)[ 1 .. 6 ];
            print if $u{$k} == 1 or split == $m{$k};
        }
    }' infile

# 5  
Old 07-20-2009
Question

hii cant we use any other command like awk or any unix command for getting such output as shown in my privous post
# 6  
Old 07-20-2009
Of course, but why? What's the problem with Perl?
# 7  
Old 07-20-2009
I dont know to use perl and i am not understanding the code which you have given..so if u tell me any other simple command like awk or anything else it will be helpful.I am new to linux .
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines, sort it and save it as file itself

Hi, all I have a csv file that I would like to remove duplicate lines based on 1st field and sort them by the 1st field. If there are more than 1 line which is same on the 1st field, I want to keep the first line of them and remove the rest. I think I have to use uniq or something, but I still... (8 Replies)
Discussion started by: refrain
8 Replies

2. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

3. Shell Programming and Scripting

Remove duplicate lines from a 50 MB file size

hi, Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file. (6 Replies)
Discussion started by: vsachan
6 Replies

4. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

5. Shell Programming and Scripting

remove duplicate lines from file linux/sh

greetings, i'm hoping there is a way to cat a file, remove duplicate lines and send that output to a new file. the file will always vary but be something similar to this: please keep in mind that the above could be eight occurrences of each hostname or it might simply have another four of an... (2 Replies)
Discussion started by: crimso
2 Replies

6. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

7. UNIX for Dummies Questions & Answers

Delete duplicate lines and print to file

OK, I have read several things on how to do this, but can't make it work. I am writing this to a vi file then calling it as an awk script. So I need to search a file for duplicate lines, delete duplicate lines, then write the result to another file, say /home/accountant/files/docs/nodup ... (2 Replies)
Discussion started by: bfurlong
2 Replies

8. Shell Programming and Scripting

delete semi-duplicate lines from file?

Ok here's what I'm trying to do. I need to get a listing of all the mountpoints on a system into a file, which is easy enough, just using something like "mount | awk '{print $1}'" However, on a couple of systems, they have some mount points looking like this: /stage /stand /usr /MFPIS... (2 Replies)
Discussion started by: paqman
2 Replies

9. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

10. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
Login or Register to Ask a Question