Sponsored Content
Top Forums Shell Programming and Scripting Removing Dupes from huge file- awk/perl/uniq Post 302623081 by Scrutinizer on Friday 13th of April 2012 05:56:55 AM
Old 04-13-2012
Can't you use unique sort?
Code:
sort -ut, -k1,3 infile > outfile

This User Gave Thanks to Scrutinizer For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Removing duplicates [sort , uniq]

Hey Guys, I have file which looks like this, Contig201#numbPA Contig1452#nmdynD6PA dm022p15.r#CG6461PA dm005e16.f#SpatPA IGU001_0015_A06.f#CG17593PA I need to remove duplicates based on the chracter matching upto '#'. for example if we consider this.. Contig201#numbPA... (4 Replies)
Discussion started by: sharatz83
4 Replies

2. Shell Programming and Scripting

Using an awk script to identify dupes in two files

Hello, I have two files. File1 or the master file contains two columns separated by a delimiter: a=b b=d e=f g=h File 2 which is the file to be processed has only a single column a h c b What I need is an awk script to identify unique names from file 2 which are not found in the... (6 Replies)
Discussion started by: gimley
6 Replies

3. Shell Programming and Scripting

Help in modifying existing Perl Script to produce report of dupes

Hello, I have a large amount of data with the following structure: Word=Transliterated word I have written a Perl Script (reproduced below) which goes through the full file and identifies all dupes on the right hand side. It creates successfully a new file with two headers: Singletons and Dupes.... (5 Replies)
Discussion started by: gimley
5 Replies

4. Shell Programming and Scripting

Awk to Count Multiple patterns in a huge file

Hi, I have a file that is 430K lines long. It has records like below |site1|MAP |site2|MAP |site1|MODAL |site2|MAP |site2|MODAL |site2|LINK |site1|LINK My task is to count the number of time MAP, MODAL, LINK occurs for a single site and write new records like below to a new file ... (5 Replies)
Discussion started by: reach.sree@gmai
5 Replies

5. Shell Programming and Scripting

Fetching record based on Uniq Key from huge file.

Hi i want to fetch 100k record from a file which is looking like as below. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... (17 Replies)
Discussion started by: lathigara
17 Replies

6. Shell Programming and Scripting

Help with removing duplicate entries with awk or Perl

Hi, I have a file which looks like:ke this : chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11130990 11131025 chr1 11127067 11132181 89 chr1 11128023 11128311 chr1 11131583... (22 Replies)
Discussion started by: Amit Pande
22 Replies

7. Shell Programming and Scripting

Removing dupes within 2 delimited areas in a large dictionary file

Hello, I have a very large dictionary file which is in text format and which contains a large number of sub-sections. Each sub-section starts with the following header : #DATA #VALID 1 and ends with a footer as shown below #END The data between the Header and the Footer consists of... (6 Replies)
Discussion started by: gimley
6 Replies

8. UNIX for Advanced & Expert Users

Performance problem with removing duplicates in a huge file (50+ GB)

I'm trying to remove duplicate data from an input file with unsorted data which is of size >50GB and write the unique records to a new file. I'm trying and already tried out a variety of options posted in similar threads/forums. But no luck so far.. Any suggestions please ? Thanks !! (9 Replies)
Discussion started by: Kannan K
9 Replies

9. Shell Programming and Scripting

Help with Perl script for identifying dupes in column1

Dear all, I have a large dictionary database which has the following structure source word=target word e.g. book=livre Since the database is very large in spite of all the care taken, it so happens that at times the source word is repeated e.g. book=livre book=tome Since I want to... (7 Replies)
Discussion started by: gimley
7 Replies

10. Shell Programming and Scripting

Removing White spaces from a huge file

I am trying to remove whitespaces from a file containing sample data as: 457 <EOFD> Mar 1 2007 12:00:00:000AM <EOFD> Mar 31 2007 12:00:00:000AM <EOFD> system <EORD> 458 <EOFD> Mar 1 2007 12:00:00:000AM<EOFD>agf <EOFD> Apr 20 2007 9:10:56:036PM <EOFD> prodiws<EORD> . Basically these... (11 Replies)
Discussion started by: amvip
11 Replies
Tree::Simple::Visitor::Sort(3pm)			User Contributed Perl Documentation			  Tree::Simple::Visitor::Sort(3pm)

NAME
Tree::Simple::Visitor::Sort - A Visitor for sorting a Tree::Simple object heirarchy SYNOPSIS
use Tree::Simple::Visitor::Sort; # create a visitor object my $visitor = Tree::Simple::Visitor::Sort->new(); $tree->accept($visitor); # the tree is now sorted ascii-betically # set the sort function to # use a numeric comparison $visitor->setSortFunction($visitor->NUMERIC); $tree->accept($visitor); # the tree is now sorted numerically # set a custom sort function $visitor->setSortFunction(sub { my ($left, $right) = @_; lc($left->getNodeValue()->{name}) cmp lc($right->getNodeValue()->{name}); }); $tree->accept($visitor); # the tree's node are now sorted appropriately DESCRIPTION
This implements a recursive multi-level sort of a Tree::Simple heirarchy. I think this deserves some more explaination, and the best way to do that is visually. Given the tree: 1 1.3 1.2 1.2.2 1.2.1 1.1 4 4.1 2 2.1 3 3.3 3.2 3.1 A normal sort would produce the following tree: 1 1.1 1.2 1.2.1 1.2.2 1.3 2 2.1 3 3.1 3.2 3.3 4 4.1 A sort using the built-in REVERSE sort function would produce the following tree: 4 4.1 3 3.3 3.2 3.1 2 2.1 1 1.3 1.2 1.2.2 1.2.1 1.1 As you can see, no node is moved up or down from it's current depth, but sorted with it's siblings. Flexible customized sorting is possible within this framework, however, this cannot be used for tree-balancing or anything as complex as that. METHODS
new There are no arguments to the constructor the object will be in its default state. You can use the "setNodeFilter" and "setSortFunction" methods to customize its behavior. includeTrunk ($boolean) Based upon the value of $boolean, this will tell the visitor to include the trunk of the tree in the sort as well. setNodeFilter ($filter_function) This method accepts a CODE reference as it's $filter_function argument and throws an exception if it is not a code reference. This code reference is used to filter the tree nodes as they are sorted. This can be used to gather specific information from a more complex tree node. The filter function should accept a single argument, which is the current Tree::Simple object. setSortFunction ($sort_function) This method accepts a CODE reference as it's $sort_function argument and throws an exception if it is not a code reference. The $sort_function is used by perl's builtin "sort" routine to sort each level of the tree. The $sort_function is passed two Tree::Simple objects, and must return 1 (greater than), 0 (equal to) or -1 (less than). The sort function will override and bypass any node filters which have been applied (see "setNodeFilter" method above), they cannot be used together. Several pre-built sort functions are provided. All of these functions assume that calling "getNodeValue" on the Tree::Simple object will return a suitable sortable value. REVERSE This is the reverse of the normal sort using "cmp". NUMERIC This uses the numeric comparison operator "<=>" to sort. REVERSE_NUMERIC The reverse of the above. ALPHABETICAL This lowercases the node value before using "cmp" to sort. This results in a true alphabetical sorting. REVERSE_ALPHABETICAL The reverse of the above. If you need to implement one of these sorting routines, but need special handling of your Tree::Simple objects (such as would be done with a node filter), I suggest you read the source code and copy and modify your own sort routine. If it is requested enough I will provide this feature in future versions, but for now I am not sure there is a large need. visit ($tree) This is the method that is used by Tree::Simple's "accept" method. It can also be used on its own, it requires the $tree argument to be a Tree::Simple object (or derived from a Tree::Simple object), and will throw and exception otherwise. It should be noted that this is a destructive action, since the sort happens in place and does not produce a copy of the tree. BUGS
None that I am aware of. Of course, if you find a bug, let me know, and I will be sure to fix it. CODE COVERAGE
See the CODE COVERAGE section in Tree::Simple::VisitorFactory for more inforamtion. SEE ALSO
These Visitor classes are all subclasses of Tree::Simple::Visitor, which can be found in the Tree::Simple module, you should refer to that module for more information. ACKNOWLEDGEMENTS
Thanks to Vitor Mori for the idea and much of the code for this Visitor. AUTHORS
Vitor Mori, <vvvv767@hotmail.com> stevan little, <stevan@iinteractive.com> COPYRIGHT AND LICENSE
Copyright 2004, 2005 by Vitor Mori & Infinity Interactive, Inc. <http://www.iinteractive.com> This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. perl v5.10.1 2005-07-14 Tree::Simple::Visitor::Sort(3pm)
All times are GMT -4. The time now is 06:46 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy