Sponsored Content
Top Forums UNIX for Advanced & Expert Users In a huge file, Delete duplicate lines leaving unique lines Post 302543885 by radoulov on Tuesday 2nd of August 2011 10:43:02 AM
Old 08-02-2011
Try the solution suggested by yazu:

Code:
split -l 1000000 infile

for f in x*; do
  sort -u "$f" > "$f"_sorted
done

sort -u x*_sorted > final.out

I believe the final sort should be with -u (not -m).
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete lines from huge file

I have to delete 1st 7000 lines of a file which is 12GB large. As it is so large, i can't open in vi and delete these lines. Also I found one post here which gave solution using perl, but I don't have perl installed. Also some solutions were redirecting the o/p to a different file and renaming it.... (3 Replies)
Discussion started by: rahulrathod
3 Replies

2. Shell Programming and Scripting

delete semi-duplicate lines from file?

Ok here's what I'm trying to do. I need to get a listing of all the mountpoints on a system into a file, which is easy enough, just using something like "mount | awk '{print $1}'" However, on a couple of systems, they have some mount points looking like this: /stage /stand /usr /MFPIS... (2 Replies)
Discussion started by: paqman
2 Replies

3. UNIX for Dummies Questions & Answers

Delete duplicate lines and print to file

OK, I have read several things on how to do this, but can't make it work. I am writing this to a vi file then calling it as an awk script. So I need to search a file for duplicate lines, delete duplicate lines, then write the result to another file, say /home/accountant/files/docs/nodup ... (2 Replies)
Discussion started by: bfurlong
2 Replies

4. UNIX for Dummies Questions & Answers

How to delete or remove duplicate lines in a file

Hi please help me how to remove duplicate lines in any file. I have a file having huge number of lines. i want to remove selected lines in it. And also if there exists duplicate lines, I want to delete the rest & just keep one of them. Please help me with any unix commands or even fortran... (7 Replies)
Discussion started by: reva
7 Replies

5. UNIX for Dummies Questions & Answers

Delete lines with duplicate strings based on date

Hey all, a relative bash/script newbie trying solve a problem. I've got a text file with lots of lines that I've been able to clean up and format with awk/sed/cut, but now I'd like to remove the lines with duplicate usernames based on time stamp. Here's what the data looks like 2007-11-03... (3 Replies)
Discussion started by: mattv
3 Replies

6. UNIX for Dummies Questions & Answers

How to delete partial duplicate lines unix

hi :) I need to delete partial duplicate lines I have this in a file sihp8027,/opt/cf20,1980182 sihp8027,/opt/oracle/10gRelIIcd,155200016 sihp8027,/opt/oracle/10gRelIIcd,155200176 sihp8027,/var/opt/ERP,10376312 and need to leave it like this: sihp8027,/opt/cf20,1980182... (2 Replies)
Discussion started by: C|KiLLeR|S
2 Replies

7. Shell Programming and Scripting

Delete lines in file containing duplicate strings, keeping longer strings

The question is not as simple as the title... I have a file, it looks like this <string name="string1">RZ-LED</string> <string name="string2">2.0</string> <string name="string2">Version 2.0</string> <string name="string3">BP</string> I would like to check for duplicate entries of... (11 Replies)
Discussion started by: raidzero
11 Replies

8. Shell Programming and Scripting

Delete duplicate lines... with a twist!

Hi, I'm sorry I'm no coder so I came here, counting on your free time and good will to beg for spoonfeeding some good code. I'll try to be quick and concise! Got file with 50k lines like this: "Heh, heh. Those darn ninjas. They're _____."*wacky The "canebrake", "timber" & "pygmy" are types... (7 Replies)
Discussion started by: shadowww
7 Replies

9. UNIX for Beginners Questions & Answers

How to delete identical lines while leaving one undeleted?

Hi, I have a file as follows. file1 Hello Hi His Hi Hi Hungry hi so I want to delete identical lines while leaving one of them undeleted. So desired output will be Hello Hi (2 Replies)
Discussion started by: beginner_99
2 Replies

10. UNIX for Beginners Questions & Answers

Delete duplicate like pattern lines

Hi I need to delete duplicate like pattern lines from a text file containing 2 duplicates only (one being subset of the other) using sed or awk preferably. Input: FM:Chicago:Development FM:Chicago:Development:Score SR:Cary:Testing:Testcases PM:Newyork:Scripting PM:Newyork:Scripting:Audit... (6 Replies)
Discussion started by: tech_frk
6 Replies
Tree::Simple::Visitor::Sort(3pm)			User Contributed Perl Documentation			  Tree::Simple::Visitor::Sort(3pm)

NAME
Tree::Simple::Visitor::Sort - A Visitor for sorting a Tree::Simple object heirarchy SYNOPSIS
use Tree::Simple::Visitor::Sort; # create a visitor object my $visitor = Tree::Simple::Visitor::Sort->new(); $tree->accept($visitor); # the tree is now sorted ascii-betically # set the sort function to # use a numeric comparison $visitor->setSortFunction($visitor->NUMERIC); $tree->accept($visitor); # the tree is now sorted numerically # set a custom sort function $visitor->setSortFunction(sub { my ($left, $right) = @_; lc($left->getNodeValue()->{name}) cmp lc($right->getNodeValue()->{name}); }); $tree->accept($visitor); # the tree's node are now sorted appropriately DESCRIPTION
This implements a recursive multi-level sort of a Tree::Simple heirarchy. I think this deserves some more explaination, and the best way to do that is visually. Given the tree: 1 1.3 1.2 1.2.2 1.2.1 1.1 4 4.1 2 2.1 3 3.3 3.2 3.1 A normal sort would produce the following tree: 1 1.1 1.2 1.2.1 1.2.2 1.3 2 2.1 3 3.1 3.2 3.3 4 4.1 A sort using the built-in REVERSE sort function would produce the following tree: 4 4.1 3 3.3 3.2 3.1 2 2.1 1 1.3 1.2 1.2.2 1.2.1 1.1 As you can see, no node is moved up or down from it's current depth, but sorted with it's siblings. Flexible customized sorting is possible within this framework, however, this cannot be used for tree-balancing or anything as complex as that. METHODS
new There are no arguments to the constructor the object will be in its default state. You can use the "setNodeFilter" and "setSortFunction" methods to customize its behavior. includeTrunk ($boolean) Based upon the value of $boolean, this will tell the visitor to include the trunk of the tree in the sort as well. setNodeFilter ($filter_function) This method accepts a CODE reference as it's $filter_function argument and throws an exception if it is not a code reference. This code reference is used to filter the tree nodes as they are sorted. This can be used to gather specific information from a more complex tree node. The filter function should accept a single argument, which is the current Tree::Simple object. setSortFunction ($sort_function) This method accepts a CODE reference as it's $sort_function argument and throws an exception if it is not a code reference. The $sort_function is used by perl's builtin "sort" routine to sort each level of the tree. The $sort_function is passed two Tree::Simple objects, and must return 1 (greater than), 0 (equal to) or -1 (less than). The sort function will override and bypass any node filters which have been applied (see "setNodeFilter" method above), they cannot be used together. Several pre-built sort functions are provided. All of these functions assume that calling "getNodeValue" on the Tree::Simple object will return a suitable sortable value. REVERSE This is the reverse of the normal sort using "cmp". NUMERIC This uses the numeric comparison operator "<=>" to sort. REVERSE_NUMERIC The reverse of the above. ALPHABETICAL This lowercases the node value before using "cmp" to sort. This results in a true alphabetical sorting. REVERSE_ALPHABETICAL The reverse of the above. If you need to implement one of these sorting routines, but need special handling of your Tree::Simple objects (such as would be done with a node filter), I suggest you read the source code and copy and modify your own sort routine. If it is requested enough I will provide this feature in future versions, but for now I am not sure there is a large need. visit ($tree) This is the method that is used by Tree::Simple's "accept" method. It can also be used on its own, it requires the $tree argument to be a Tree::Simple object (or derived from a Tree::Simple object), and will throw and exception otherwise. It should be noted that this is a destructive action, since the sort happens in place and does not produce a copy of the tree. BUGS
None that I am aware of. Of course, if you find a bug, let me know, and I will be sure to fix it. CODE COVERAGE
See the CODE COVERAGE section in Tree::Simple::VisitorFactory for more inforamtion. SEE ALSO
These Visitor classes are all subclasses of Tree::Simple::Visitor, which can be found in the Tree::Simple module, you should refer to that module for more information. ACKNOWLEDGEMENTS
Thanks to Vitor Mori for the idea and much of the code for this Visitor. AUTHORS
Vitor Mori, <vvvv767@hotmail.com> stevan little, <stevan@iinteractive.com> COPYRIGHT AND LICENSE
Copyright 2004, 2005 by Vitor Mori & Infinity Interactive, Inc. <http://www.iinteractive.com> This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.1 2005-07-14 Tree::Simple::Visitor::Sort(3pm)
All times are GMT -4. The time now is 04:32 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy