How to delete or remove duplicate lines in a file


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to delete or remove duplicate lines in a file
# 8  
Old 07-20-2009
OK,
the same written in awk:

Code:
awk 'END { while (++c <= NR) {
    n = split(r[c], t); k = ""
    for (i=2; i<=7; i++) k = k ? k SUBSEP t[i] : t[i]
    if (u[k] == 1 || n == m[k]) print r[c] } 
    }
{ k = ""; for (i=2; i<=7; i++) k = k ? k SUBSEP $i : $i
  if (u[k]++ && NF > m[k]) m[k] = NF; r[NR] = $0 }
  ' infile

You should use gawk, nawk or /usr/xpg4/bin/awk on Solaris.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicate lines, sort it and save it as file itself

Hi, all I have a csv file that I would like to remove duplicate lines based on 1st field and sort them by the 1st field. If there are more than 1 line which is same on the 1st field, I want to keep the first line of them and remove the rest. I think I have to use uniq or something, but I still... (8 Replies)
Discussion started by: refrain
8 Replies

2. Shell Programming and Scripting

Remove duplicate lines from a file

Hi, I have a csv file which contains some millions of lines in it. The first line(Header) repeats at every 50000th line. I want to remove all the duplicate headers from the second occurance(should not remove the first line). I don't want to use any pattern from the Header as I have some... (7 Replies)
Discussion started by: sudhakar T
7 Replies

3. Shell Programming and Scripting

Remove duplicate lines from a 50 MB file size

hi, Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file. (6 Replies)
Discussion started by: vsachan
6 Replies

4. Shell Programming and Scripting

How do I remove the duplicate lines in this file?

Hey guys, need some help to fix this script. I am trying to remove all the duplicate lines in this file. I wrote the following script, but does not work. What is the problem? The output file should only contain five lines: Later! (5 Replies)
Discussion started by: Ernst
5 Replies

5. Shell Programming and Scripting

remove duplicate lines from file linux/sh

greetings, i'm hoping there is a way to cat a file, remove duplicate lines and send that output to a new file. the file will always vary but be something similar to this: please keep in mind that the above could be eight occurrences of each hostname or it might simply have another four of an... (2 Replies)
Discussion started by: crimso
2 Replies

6. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

7. UNIX for Dummies Questions & Answers

Delete duplicate lines and print to file

OK, I have read several things on how to do this, but can't make it work. I am writing this to a vi file then calling it as an awk script. So I need to search a file for duplicate lines, delete duplicate lines, then write the result to another file, say /home/accountant/files/docs/nodup ... (2 Replies)
Discussion started by: bfurlong
2 Replies

8. Shell Programming and Scripting

delete semi-duplicate lines from file?

Ok here's what I'm trying to do. I need to get a listing of all the mountpoints on a system into a file, which is easy enough, just using something like "mount | awk '{print $1}'" However, on a couple of systems, they have some mount points looking like this: /stage /stand /usr /MFPIS... (2 Replies)
Discussion started by: paqman
2 Replies

9. UNIX for Dummies Questions & Answers

Remove Duplicate lines from File

I have a log file "logreport" that contains several lines as seen below: 04:20:00 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 06:38:08 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but responded to ping 07:11:05 /usr/lib/snmp/snmpdx: Agent snmpd appeared dead but... (18 Replies)
Discussion started by: Nysif Steve
18 Replies

10. Shell Programming and Scripting

Remove Duplicate Lines in File

I am doing KSH script to remove duplicate lines in a file. Let say the file has format below. FileA 1253-6856 3101-4011 1827-1356 1822-1157 1822-1157 1000-1410 1000-1410 1822-1231 1822-1231 3101-4011 1822-1157 1822-1231 and I want to simply it with no duplicate line as file... (5 Replies)
Discussion started by: Teh Tiack Ein
5 Replies
Login or Register to Ask a Question
PRUNEHISTORY(8) 					      System Manager's Manual						   PRUNEHISTORY(8)

NAME
prunehistory - remove file names from Usenet history file SYNOPSIS
prunehistory [ -f filename ] [ -p ] [ input ] DESCRIPTION
Prunehistory modifies the history(5) text file to ``remove'' a set of filenames from it. The filenames are removed by overwriting them with spaces, so that the size and position of any following entries does not change. Prunehistory reads the named input file, or standard input if no file is given. The input is taken as a set of lines. Blank lines and lines starting with a number sign (``#'') are ignored. All other lines are should consist of a Message-ID followed by zero or more file- names. The Messge-ID is used as the dbz(3) key to get an offset into the text file. If no filenames are mentioned on the input line, then all filenames in the text are ``removed.'' If any filenames are mentioned, they are converted into the history file notation. If they appear in the line for the specified Message-ID then they are removed. Since innd(8) only appends to the text file, prunehistory does not need to have any interaction with it. OPTIONS
-p Prunehistory will normally complain about lines that do not follow the correct format. If the ``-p'' flag is used, then the program will silently print any invalid lines on its standard output. (Blank lines and comment lines are also passed through.) This can be useful when prunehistory is used as a filter for other programs such as reap. -f The default name of the history file is /var/lib/news/history; to specify a different name, use the ``-f'' flag. EXAMPLES
It is a good idea to delete purged entries and rebuild the dbz database every so often by using a script like the following: ctlinnd throttle "Rebuilding history database" cd /var/lib/news awk 'NF > 2 { printf "%s %s %s", $1, $2, $3; for (i = 4; i <= NF; i++) printf " %s", $i; print " "; }' <history >history.n if makehistory -r -f history.n ; then mv history.n history mv history.n.pag history.pag mv history.n.dir history.dir else echo 'Problem rebuilding history; old file not replaced' fi ctlinnd go "Rebuilding history database" Note that this keeps no record of expired articles. HISTORY
Written by Rich $alz <rsalz@uunet.uu.net> for InterNetNews. This is revision 1.9, dated 1996/10/29. SEE ALSO
dbz(3), history(5), innd(8). PRUNEHISTORY(8)