Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Removing duplicates from a file Post 302836269 by Don Cragun on Tuesday 23rd of July 2013 11:59:06 PM
Old 07-24-2013
There are lots of ways to do this. If you want the output sorted by the key you're using to determine duplication, sort -u is the most logical choice. If you want to give preference to the 1st entry found with a given key and have output order match input order, awk provides a simple way to do it:
Code:
awk -F, '!a[$2,$3,$4]++' input1 input2 > mergeNoDup
awk -F, '!a[$2,$3,$4]++' mergeWithDups > mergeNoDup

Use the 1st line if you have your separate files from server 1 and server 2; use the 2nd line if you have already created a merged file and want to remove duplicates from the merged file. Both of these will work with an unlimited number of input files as long as you don't reach ARG_MAX limitations on the number of input files you're feeding into awk.

As always, if you want to try this on a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk rather than /bin/awk or /usr/bin/awk.

Other ways to do this include writing a C (or other high level language program), perl, using the associative arrays that are available in some shells, and an endless number of much less efficient combinations using read (to get a list of keys), grep (to get a count of lines containing a key), and an editor (to remove all but one occurrence of duplicated keys).
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

removing duplicates from a file

i have a file with some 1000 entries it will contain entries like 1000,ram 2000,pankaj 1001,rahim 1000,ram 2532,govind 2000,pankaj 3000,venkat 2532,govind what i want is i want to extract only the distinct rows from this file so my output should contain only 1000,ram... (2 Replies)
Discussion started by: trichyselva
2 Replies

2. Shell Programming and Scripting

Removing duplicates in a sorted file by field.

I have data like this: It's sorted by the 2nd field (TID). envoy,90000000000000634600010001,04/11/2008,23:19:27,RB00266,0015,DETAIL,ERROR, envoy,90000000000000634600010001,04/12/2008,04:23:45,RB00266,0015,DETAIL,ERROR,... (1 Reply)
Discussion started by: kinksville
1 Replies

3. UNIX for Dummies Questions & Answers

removing duplicates of a pattern from a file

hey all, I need some help. I have a text file with names in it. My target is that if a particular pattern exists in that file more than once..then i want to rename all the occurences of that pattern by alternate patterns.. for e.g if i have PATTERN occuring 5 times then i want to... (3 Replies)
Discussion started by: ashisharora
3 Replies

4. Shell Programming and Scripting

Removing duplicates from log file?

I have a log file with posts looking like this: -- Messages can be delivered by different systems at different times. The id number is used to sort out duplicate messages. What I need is to strip the arrival time from each post, sort posts by id number, and reattach arrival time to respective... (2 Replies)
Discussion started by: Ilja
2 Replies

5. Shell Programming and Scripting

Removing Duplicates from file

Hi Experts, Please check the following new requirement. I got data like the following in a file. FILE_HEADER 01cbbfde7898410| 3477945| home| 1 01cbc275d2c122| 3478234| WORK| 1 01cbbe4362743da| 3496386| Rich Spare| 1 01cbc275d2c122| 3478234| WORK| 1 This is pipe separated file with... (3 Replies)
Discussion started by: tinufarid
3 Replies

6. Shell Programming and Scripting

formatting a file and removing duplicates

Hi, I have a file that I want to change the format of. It is a large file in rows but I want it to be comma separated (comma then a space). The current file looks like this: HI, Joe, Bob, Jack, Jack After I would want to remove any duplicates so it would look like this: HI, Joe,... (2 Replies)
Discussion started by: kylle345
2 Replies

7. Shell Programming and Scripting

Removing duplicates depending on file size

Hi all, I am working with a huge amount of files in a Linux environment and I was trying to filter my data. Here's what my data looks like Name............................Size OLUSDN.gf.gif-1.JPEG.......5 kb LKJFDA01.gf.gif-1.JPEG.....3 kb LKJFDA01.gf.gif-2.JPEG.....1 kb... (7 Replies)
Discussion started by: Error404
7 Replies

8. UNIX for Dummies Questions & Answers

Grep from pattern file without removing duplicates?

I have been using grep to output whole lines using a pattern file with identifiers (fileA): fig|562.2322.peg.1 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.7 From fileB with corresponding identifiers in the second column: NODE_0 fig|562.2322.peg.1 peg ... (2 Replies)
Discussion started by: Mauve
2 Replies

9. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

10. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies
KBOOKMARKMERGER(1)						 KDE User's Manual						KBOOKMARKMERGER(1)

NAME
kbookmarkmerger - A program for merging a given set of bookmarks into the user's list of bookmarks. SYNOPSIS
kbookmarkmerger [Qt Options...] [KDE Options...] {directory} DESCRIPTION
kbookmarkmerger is a program for merging a given set of bookmarks into the user's set of bookmarks; if the user doesn't have any bookmarks created yet, a new bookmark list is created and the given bookmarks are inserted into it. While doing this, kbookmarkmerger keeps track of which files were merged in a previous run already, so no bookmark will get installed to the user's bookmarks more than once. In case KDE is running while kbookmarkmerger is executed, the KDE bookmark subsystem will be informed of any changes to the user's bookmarks, so that all applications accessing that information (i.e. Konqueror) will pick the changes up instantly. The traditionally used approach to achieve something like this was to use carefully crafted skeleton home directories when creating a new user account, to provide the user with a standard set of bookmarks. The problem with this approach is that after the user account has been created, no new bookmarks can be propagated. This mechanism is useful for system administrators who want to propagate a bookmark pointing to a certain document (for instance, important notes about the system) to all users. Distributors might find it useful as well, for instance by augmenting the software packages they sup- ply with bookmark files which kbookmarkmerger merges into the user's bookmark list when the package is installed. That way, documentation shipped with a software package is easily and visibly accessible right after the package has been installed. The only parameter required by kbookmarkmerger is the name of a directory which shall be scanned for bookmark files. All files in the given directory will be considered for being merged into the user's setup. The files in the given directory should be valid XBEL files. SEE ALSO
Konqueror manual: help:/konqueror/index.html EXAMPLES
joe@hal9000:~> kbookmarkmerger /usr/local/extra-bookmarks Merges all bookmark files stored in /usr/local/extra-bookmarks into joe's list of bookmarks. STANDARDS
XBEL specification: http://pyxml.sourceforge.net/topics/xbel/ RESTRICTIONS
When determining whether a given bookmark file has been merged into the user's bookmarks already or not, kbookmarkmerger merely looks at the filename of the bookmark file - the contents are not checked at all. This means that changing a bookmark file which was already merged into a user's bookmarks will not trigger merging it once again. Also note that in case a user modifies a bookmark which was merged into his setup, the original bookmark file will be left unmodified. AUTHOR
Frerich Raabe <raabe@kde.org> AUTHOR
Frerich Raabe. K Desktop Environment February 1st, 2005 KBOOKMARKMERGER(1)
All times are GMT -4. The time now is 03:36 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy