Sponsored Content
Top Forums UNIX for Dummies Questions & Answers Removing duplicates from a file Post 302835715 by Sri3001 on Tuesday 23rd of July 2013 01:06:47 AM
Old 07-23-2013
Removing duplicates from a file

Hi All,

I am merging files coming from 2 different systems ,while doing that I am getting duplicates entries in the merged file

Code:
I,01,000131,764,2,4.00
I,01,000131,765,2,4.00
I,01,000131,772,2,4.00
I,01,000131,773,2,4.00
I,01,000168,762,2,2.00
I,01,000168,763,2,2.00
I,01,000622,761,6,14.64
I,01,000622,762,6,14.64
I,01,000622,763,6,14.64
I,01,000684,767,2,10.00
I,01,000131,764,2,5.00
I,01,000131,765,2,5.00
I,01,000131,772,2,6.00
I,01,000131,773,2,4.00
I,01,000168,762,2,2.00
I,01,000168,763,2,2.00
I,01,000622,761,6,14.64
I,01,000622,762,6,14.64
I,01,000622,763,6,14.64
I,01,000684,767,2,10.00


I tried using sort -u command to sort it unqiuely, using

Code:
sort -k 2,2 -k 3,3 -k 4,4 sample.txt | sort -u

,but it is not returning the correct result, it is giving the output like

Code:
I,01,000131,764,2,4.00
I,01,000131,764,2,5.00
I,01,000131,765,2,4.00
I,01,000131,765,2,5.00
I,01,000131,772,2,4.00
I,01,000131,772,2,6.00
I,01,000131,773,2,4.00
I,01,000168,762,2,2.00
I,01,000168,763,2,2.00
I,01,000622,761,6,14.64
I,01,000622,762,6,14.64
I,01,000622,763,6,14.64
I,01,000684,767,2,10.00

Is there way I can sort the file unique or remove duplicates using the 2nd ,3rd and 4th column as key fields.

Thanks
Sri

Last edited by Scrutinizer; 07-23-2013 at 02:15 AM.. Reason: code tags
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

removing duplicates from a file

i have a file with some 1000 entries it will contain entries like 1000,ram 2000,pankaj 1001,rahim 1000,ram 2532,govind 2000,pankaj 3000,venkat 2532,govind what i want is i want to extract only the distinct rows from this file so my output should contain only 1000,ram... (2 Replies)
Discussion started by: trichyselva
2 Replies

2. Shell Programming and Scripting

Removing duplicates in a sorted file by field.

I have data like this: It's sorted by the 2nd field (TID). envoy,90000000000000634600010001,04/11/2008,23:19:27,RB00266,0015,DETAIL,ERROR, envoy,90000000000000634600010001,04/12/2008,04:23:45,RB00266,0015,DETAIL,ERROR,... (1 Reply)
Discussion started by: kinksville
1 Replies

3. UNIX for Dummies Questions & Answers

removing duplicates of a pattern from a file

hey all, I need some help. I have a text file with names in it. My target is that if a particular pattern exists in that file more than once..then i want to rename all the occurences of that pattern by alternate patterns.. for e.g if i have PATTERN occuring 5 times then i want to... (3 Replies)
Discussion started by: ashisharora
3 Replies

4. Shell Programming and Scripting

Removing duplicates from log file?

I have a log file with posts looking like this: -- Messages can be delivered by different systems at different times. The id number is used to sort out duplicate messages. What I need is to strip the arrival time from each post, sort posts by id number, and reattach arrival time to respective... (2 Replies)
Discussion started by: Ilja
2 Replies

5. Shell Programming and Scripting

Removing Duplicates from file

Hi Experts, Please check the following new requirement. I got data like the following in a file. FILE_HEADER 01cbbfde7898410| 3477945| home| 1 01cbc275d2c122| 3478234| WORK| 1 01cbbe4362743da| 3496386| Rich Spare| 1 01cbc275d2c122| 3478234| WORK| 1 This is pipe separated file with... (3 Replies)
Discussion started by: tinufarid
3 Replies

6. Shell Programming and Scripting

formatting a file and removing duplicates

Hi, I have a file that I want to change the format of. It is a large file in rows but I want it to be comma separated (comma then a space). The current file looks like this: HI, Joe, Bob, Jack, Jack After I would want to remove any duplicates so it would look like this: HI, Joe,... (2 Replies)
Discussion started by: kylle345
2 Replies

7. Shell Programming and Scripting

Removing duplicates depending on file size

Hi all, I am working with a huge amount of files in a Linux environment and I was trying to filter my data. Here's what my data looks like Name............................Size OLUSDN.gf.gif-1.JPEG.......5 kb LKJFDA01.gf.gif-1.JPEG.....3 kb LKJFDA01.gf.gif-2.JPEG.....1 kb... (7 Replies)
Discussion started by: Error404
7 Replies

8. UNIX for Dummies Questions & Answers

Grep from pattern file without removing duplicates?

I have been using grep to output whole lines using a pattern file with identifiers (fileA): fig|562.2322.peg.1 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.3 fig|562.2322.peg.7 From fileB with corresponding identifiers in the second column: NODE_0 fig|562.2322.peg.1 peg ... (2 Replies)
Discussion started by: Mauve
2 Replies

9. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3 (2 Replies)
Discussion started by: sagar_1986
2 Replies

10. Shell Programming and Scripting

Removing duplicates from new file

i hav two files like i want to remove/delete all the duplicate lines in file2 which are viz unix,unix2,unix3.I have tried previous post also,but in that complete line must be similar.In this case i have to verify first column only regardless what is the content in succeeding columns. (3 Replies)
Discussion started by: sagar_1986
3 Replies
Tree::Simple::Visitor::Sort(3pm)			User Contributed Perl Documentation			  Tree::Simple::Visitor::Sort(3pm)

NAME
Tree::Simple::Visitor::Sort - A Visitor for sorting a Tree::Simple object heirarchy SYNOPSIS
use Tree::Simple::Visitor::Sort; # create a visitor object my $visitor = Tree::Simple::Visitor::Sort->new(); $tree->accept($visitor); # the tree is now sorted ascii-betically # set the sort function to # use a numeric comparison $visitor->setSortFunction($visitor->NUMERIC); $tree->accept($visitor); # the tree is now sorted numerically # set a custom sort function $visitor->setSortFunction(sub { my ($left, $right) = @_; lc($left->getNodeValue()->{name}) cmp lc($right->getNodeValue()->{name}); }); $tree->accept($visitor); # the tree's node are now sorted appropriately DESCRIPTION
This implements a recursive multi-level sort of a Tree::Simple heirarchy. I think this deserves some more explaination, and the best way to do that is visually. Given the tree: 1 1.3 1.2 1.2.2 1.2.1 1.1 4 4.1 2 2.1 3 3.3 3.2 3.1 A normal sort would produce the following tree: 1 1.1 1.2 1.2.1 1.2.2 1.3 2 2.1 3 3.1 3.2 3.3 4 4.1 A sort using the built-in REVERSE sort function would produce the following tree: 4 4.1 3 3.3 3.2 3.1 2 2.1 1 1.3 1.2 1.2.2 1.2.1 1.1 As you can see, no node is moved up or down from it's current depth, but sorted with it's siblings. Flexible customized sorting is possible within this framework, however, this cannot be used for tree-balancing or anything as complex as that. METHODS
new There are no arguments to the constructor the object will be in its default state. You can use the "setNodeFilter" and "setSortFunction" methods to customize its behavior. includeTrunk ($boolean) Based upon the value of $boolean, this will tell the visitor to include the trunk of the tree in the sort as well. setNodeFilter ($filter_function) This method accepts a CODE reference as it's $filter_function argument and throws an exception if it is not a code reference. This code reference is used to filter the tree nodes as they are sorted. This can be used to gather specific information from a more complex tree node. The filter function should accept a single argument, which is the current Tree::Simple object. setSortFunction ($sort_function) This method accepts a CODE reference as it's $sort_function argument and throws an exception if it is not a code reference. The $sort_function is used by perl's builtin "sort" routine to sort each level of the tree. The $sort_function is passed two Tree::Simple objects, and must return 1 (greater than), 0 (equal to) or -1 (less than). The sort function will override and bypass any node filters which have been applied (see "setNodeFilter" method above), they cannot be used together. Several pre-built sort functions are provided. All of these functions assume that calling "getNodeValue" on the Tree::Simple object will return a suitable sortable value. REVERSE This is the reverse of the normal sort using "cmp". NUMERIC This uses the numeric comparison operator "<=>" to sort. REVERSE_NUMERIC The reverse of the above. ALPHABETICAL This lowercases the node value before using "cmp" to sort. This results in a true alphabetical sorting. REVERSE_ALPHABETICAL The reverse of the above. If you need to implement one of these sorting routines, but need special handling of your Tree::Simple objects (such as would be done with a node filter), I suggest you read the source code and copy and modify your own sort routine. If it is requested enough I will provide this feature in future versions, but for now I am not sure there is a large need. visit ($tree) This is the method that is used by Tree::Simple's "accept" method. It can also be used on its own, it requires the $tree argument to be a Tree::Simple object (or derived from a Tree::Simple object), and will throw and exception otherwise. It should be noted that this is a destructive action, since the sort happens in place and does not produce a copy of the tree. BUGS
None that I am aware of. Of course, if you find a bug, let me know, and I will be sure to fix it. CODE COVERAGE
See the CODE COVERAGE section in Tree::Simple::VisitorFactory for more inforamtion. SEE ALSO
These Visitor classes are all subclasses of Tree::Simple::Visitor, which can be found in the Tree::Simple module, you should refer to that module for more information. ACKNOWLEDGEMENTS
Thanks to Vitor Mori for the idea and much of the code for this Visitor. AUTHORS
Vitor Mori, <vvvv767@hotmail.com> stevan little, <stevan@iinteractive.com> COPYRIGHT AND LICENSE
Copyright 2004, 2005 by Vitor Mori & Infinity Interactive, Inc. <http://www.iinteractive.com> This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.1 2005-07-14 Tree::Simple::Visitor::Sort(3pm)
All times are GMT -4. The time now is 07:54 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy