Sponsored Content
Full Discussion: Filtering duplicate lines
Top Forums UNIX for Advanced & Expert Users Filtering duplicate lines Post 15944 by auswipe on Thursday 21st of February 2002 06:51:34 PM
Old 02-21-2002
This is a little late, but here is some Perl code that will also do what you want for <most> any file:

Code:
#!/usr/bin/perl

# RemoveDupes.pl
# Auswipe 21 Feb 2002
# Auswipe sez: "Hey, no guarantees!"
# Usage:
#
#	RemoveDupes.pl -file someTextFile

use Getopt::Long;
GetOptions("file=s");

my %dataHash    = ();
my $currentLine = 0;

if ($opt_file) {
  open(INPUTFILE, "$opt_file") || die "Error: $!";

  while ($logEntry = <INPUTFILE>) {
    chomp($logEntry);

    if (!exists($dataHash{$logEntry})) {
      $dataHash{$logEntry} = $currentLine;
    };

    $currentLine++;
  };
  
  close($opt_file);

} else {
  print STDOUT "You didn't select a file!\n";
};

foreach $logOutput (sort { $dataHash{$a} <=> $dataHash{$b} } (keys(%dataHash))) {
  print STDOUT "$logOutput\n";
};

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Issues with filtering duplicate records using gawk script

Hi All, I have huge trade file with milions of trades.I need to remove duplicate records (e.g I have following records) 30/10/2009,trdeId1,..,.. 26/10/2009.tradeId1,..,..,, 30/10/2009,tradeId2,.. In the above case i need to filter duplicate recods and I should get following output.... (2 Replies)
Discussion started by: nmumbarkar
2 Replies

2. UNIX for Dummies Questions & Answers

Filtering similar lines in a big list

I received this question for homework: We have to write our program into a .sh file, with "#!/bin/bash" as the first line. And we have the list of access logs in a file, looking like this (it's nearly 10,000 lines long): 65.214.44.112 - - "GET /~user0/cgg/msg08400.html HTTP/1.0" 304 -... (1 Reply)
Discussion started by: Andrew9191
1 Replies

3. Shell Programming and Scripting

filtering out duplicate substrings, regex string from a string

My input contains a single word lines. From each line data.txt prjtestBlaBlatestBlaBla prjthisBlaBlathisBlaBla prjthatBlaBladpthatBlaBla prjgoodBlaBladpgoodBlaBla prjgood1BlaBla123dpgood1BlaBla123 Desired output --> data_out.txt prjtestBlaBla prjthisBlaBla... (8 Replies)
Discussion started by: kchinnam
8 Replies

4. Homework & Coursework Questions

Filtering Unique Lines

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: The uniq command excludes consecutive duplicate lines. It has a -c option to display a count of the number... (1 Reply)
Discussion started by: billydeanmak
1 Replies

5. UNIX for Advanced & Expert Users

In a huge file, Delete duplicate lines leaving unique lines

Hi All, I have a very huge file (4GB) which has duplicate lines. I want to delete duplicate lines leaving unique lines. Sort, uniq, awk '!x++' are not working as its running out of buffer space. I dont know if this works : I want to read each line of the File in a For Loop, and want to... (16 Replies)
Discussion started by: krishnix
16 Replies

6. Shell Programming and Scripting

Perl: filtering lines based on duplicate values in a column

Hi I have a file like this. I need to eliminate lines with first column having the same value 10 times. 13 18 1 + chromosome 1, 122638287 AGAGTATGGTCGCGGTTG 13 18 1 + chromosome 1, 128904080 AGAGTATGGTCGCGGTTG 13 18 1 - chromosome 14, 13627938 CAACCGCGACCATACTCT 13 18 1 + chromosome 1,... (5 Replies)
Discussion started by: polsum
5 Replies

7. UNIX for Dummies Questions & Answers

Filtering data -extracting specific lines

I have a table to data which one of the columns include string of text from within that, I am searching to include few lines but not others for example I want to to include some combination of word address such as (address.| address? |the address | your address) but not (ip address | email... (17 Replies)
Discussion started by: A-V
17 Replies

8. Shell Programming and Scripting

Filtering out lines in a .csv file

Hi Guys, Would need your expert help with the following situation.. I have a comma seperated .csv file, with a header row and data as follows H1,H2,H3,H4,H5..... (header row) 0,0,0,0,0,1,2.... (data rows follow) 0,0,0,0,0,0,1 ......... ......... i need a code... (10 Replies)
Discussion started by: dev.devil.1983
10 Replies

9. Shell Programming and Scripting

Awk/sed : help on:Filtering multiple lines to one:

Experts Good day, I want to filter multiple lines of same error of same day , to only 1 error of each day, the first line from the log. Here is the file: May 26 11:29:19 cmihpx02 vmunix: NFS write failed for server cmiauxe1: error 5 (RPC: Timed out) May 26 11:29:19 cmihpx02 vmunix: NFS... (4 Replies)
Discussion started by: rveri
4 Replies

10. Shell Programming and Scripting

Filtering log file with lines older than 10 days.

Hi, I am trying to compare epoch time in a huge log file (2 million lines) with todays date. I have to create two files one which has lines older than 10 days and another file with less than 10 days. I am using while do but it takes forever to complete the script. It would be helpful if you can... (12 Replies)
Discussion started by: shunya
12 Replies
Pod::Usage(3pm) 					 Perl Programmers Reference Guide					   Pod::Usage(3pm)

NAME
Pod::Usage, pod2usage() - print a usage message from embedded pod documentation SYNOPSIS
use Pod::Usage my $message_text = "This text precedes the usage message."; my $exit_status = 2; ## The exit status to use my $verbose_level = 0; ## The verbose level to use my $filehandle = *STDERR; ## The filehandle to write to pod2usage($message_text); pod2usage($exit_status); pod2usage( { -message => $message_text , -exitval => $exit_status , -verbose => $verbose_level, -output => $filehandle } ); pod2usage( -msg => $message_text , -exitval => $exit_status , -verbose => $verbose_level, -output => $filehandle ); ARGUMENTS
pod2usage should be given either a single argument, or a list of arguments corresponding to an associative array (a "hash"). When a single argument is given, it should correspond to exactly one of the following: o A string containing the text of a message to print before printing the usage message o A numeric value corresponding to the desired exit status o A reference to a hash If more than one argument is given then the entire argument list is assumed to be a hash. If a hash is supplied (either as a reference or as a list) it should contain one or more elements with the following keys: "-message" "-msg" The text of a message to print immediately prior to printing the program's usage message. "-exitval" The desired exit status to pass to the exit() function. This should be an integer, or else the string "NOEXIT" to indicate that con- trol should simply be returned without terminating the invoking process. "-verbose" The desired level of "verboseness" to use when printing the usage message. If the corresponding value is 0, then only the "SYNOPSIS" section of the pod documentation is printed. If the corresponding value is 1, then the "SYNOPSIS" section, along with any section enti- tled "OPTIONS", "ARGUMENTS", or "OPTIONS AND ARGUMENTS" is printed. If the corresponding value is 2 or more then the entire manpage is printed. "-output" A reference to a filehandle, or the pathname of a file to which the usage message should be written. The default is "*STDERR" unless the exit value is less than 2 (in which case the default is "*STDOUT"). "-input" A reference to a filehandle, or the pathname of a file from which the invoking script's pod documentation should be read. It defaults to the file indicated by $0 ($PROGRAM_NAME for users of English.pm). "-pathlist" A list of directory paths. If the input file does not exist, then it will be searched for in the given directory list (in the order the directories appear in the list). It defaults to the list of directories implied by $ENV{PATH}. The list may be specified either by a reference to an array, or by a string of directory paths which use the same path separator as $ENV{PATH} on your system (e.g., ":" for Unix, ";" for MSWin32 and DOS). DESCRIPTION
pod2usage will print a usage message for the invoking script (using its embedded pod documentation) and then exit the script with the desired exit status. The usage message printed may have any one of three levels of "verboseness": If the verbose level is 0, then only a synopsis is printed. If the verbose level is 1, then the synopsis is printed along with a description (if present) of the command line options and arguments. If the verbose level is 2, then the entire manual page is printed. Unless they are explicitly specified, the default values for the exit status, verbose level, and output stream to use are determined as follows: o If neither the exit status nor the verbose level is specified, then the default is to use an exit status of 2 with a verbose level of 0. o If an exit status is specified but the verbose level is not, then the verbose level will default to 1 if the exit status is less than 2 and will default to 0 otherwise. o If an exit status is not specified but verbose level is given, then the exit status will default to 2 if the verbose level is 0 and will default to 1 otherwise. o If the exit status used is less than 2, then output is printed on "STDOUT". Otherwise output is printed on "STDERR". Although the above may seem a bit confusing at first, it generally does "the right thing" in most situations. This determination of the default values to use is based upon the following typical Unix conventions: o An exit status of 0 implies "success". For example, diff(1) exits with a status of 0 if the two files have the same contents. o An exit status of 1 implies possibly abnormal, but non-defective, program termination. For example, grep(1) exits with a status of 1 if it did not find a matching line for the given regular expression. o An exit status of 2 or more implies a fatal error. For example, ls(1) exits with a status of 2 if you specify an illegal (unknown) option on the command line. o Usage messages issued as a result of bad command-line syntax should go to "STDERR". However, usage messages issued due to an explicit request to print usage (like specifying -help on the command line) should go to "STDOUT", just in case the user wants to pipe the out- put to a pager (such as more(1)). o If program usage has been explicitly requested by the user, it is often desireable to exit with a status of 1 (as opposed to 0) after issuing the user-requested usage message. It is also desireable to give a more verbose description of program usage in this case. pod2usage doesn't force the above conventions upon you, but it will use them by default if you don't expressly tell it to do otherwise. The ability of pod2usage() to accept a single number or a string makes it convenient to use as an innocent looking error message handling function: use Pod::Usage; use Getopt::Long; ## Parse options GetOptions("help", "man", "flag1") || pod2usage(2); pod2usage(1) if ($opt_help); pod2usage(-verbose => 2) if ($opt_man); ## Check for too many filenames pod2usage("$0: Too many files given. ") if (@ARGV > 1); Some user's however may feel that the above "economy of expression" is not particularly readable nor consistent and may instead choose to do something more like the following: use Pod::Usage; use Getopt::Long; ## Parse options GetOptions("help", "man", "flag1") || pod2usage(-verbose => 0); pod2usage(-verbose => 1) if ($opt_help); pod2usage(-verbose => 2) if ($opt_man); ## Check for too many filenames pod2usage(-verbose => 2, -message => "$0: Too many files given. ") if (@ARGV > 1); As with all things in Perl, there's more than one way to do it, and pod2usage() adheres to this philosophy. If you are interested in see- ing a number of different ways to invoke pod2usage (although by no means exhaustive), please refer to "EXAMPLES". EXAMPLES
Each of the following invocations of "pod2usage()" will print just the "SYNOPSIS" section to "STDERR" and will exit with a status of 2: pod2usage(); pod2usage(2); pod2usage(-verbose => 0); pod2usage(-exitval => 2); pod2usage({-exitval => 2, -output => *STDERR}); pod2usage({-verbose => 0, -output => *STDERR}); pod2usage(-exitval => 2, -verbose => 0); pod2usage(-exitval => 2, -verbose => 0, -output => *STDERR); Each of the following invocations of "pod2usage()" will print a message of "Syntax error." (followed by a newline) to "STDERR", immediately followed by just the "SYNOPSIS" section (also printed to "STDERR") and will exit with a status of 2: pod2usage("Syntax error."); pod2usage(-message => "Syntax error.", -verbose => 0); pod2usage(-msg => "Syntax error.", -exitval => 2); pod2usage({-msg => "Syntax error.", -exitval => 2, -output => *STDERR}); pod2usage({-msg => "Syntax error.", -verbose => 0, -output => *STDERR}); pod2usage(-msg => "Syntax error.", -exitval => 2, -verbose => 0); pod2usage(-message => "Syntax error.", -exitval => 2, -verbose => 0, -output => *STDERR); Each of the following invocations of "pod2usage()" will print the "SYNOPSIS" section and any "OPTIONS" and/or "ARGUMENTS" sections to "STD- OUT" and will exit with a status of 1: pod2usage(1); pod2usage(-verbose => 1); pod2usage(-exitval => 1); pod2usage({-exitval => 1, -output => *STDOUT}); pod2usage({-verbose => 1, -output => *STDOUT}); pod2usage(-exitval => 1, -verbose => 1); pod2usage(-exitval => 1, -verbose => 1, -output => *STDOUT}); Each of the following invocations of "pod2usage()" will print the entire manual page to "STDOUT" and will exit with a status of 1: pod2usage(-verbose => 2); pod2usage({-verbose => 2, -output => *STDOUT}); pod2usage(-exitval => 1, -verbose => 2); pod2usage({-exitval => 1, -verbose => 2, -output => *STDOUT}); Recommended Use Most scripts should print some type of usage message to "STDERR" when a command line syntax error is detected. They should also provide an option (usually "-H" or "-help") to print a (possibly more verbose) usage message to "STDOUT". Some scripts may even wish to go so far as to provide a means of printing their complete documentation to "STDOUT" (perhaps by allowing a "-man" option). The following complete exam- ple uses Pod::Usage in combination with Getopt::Long to do all of these things: use Getopt::Long; use Pod::Usage; my $man = 0; my $help = 0; ## Parse options and print usage if there is a syntax error, ## or if usage was explicitly requested. GetOptions('help|?' => $help, man => $man) or pod2usage(2); pod2usage(1) if $help; pod2usage(-verbose => 2) if $man; ## If no arguments were given, then allow STDIN to be used only ## if it's not connected to a terminal (otherwise print usage) pod2usage("$0: No files given.") if ((@ARGV == 0) && (-t STDIN)); __END__ =head1 NAME sample - Using GetOpt::Long and Pod::Usage =head1 SYNOPSIS sample [options] [file ...] Options: -help brief help message -man full documentation =head1 OPTIONS =over 8 =item B<-help> Print a brief help message and exits. =item B<-man> Prints the manual page and exits. =back =head1 DESCRIPTION B<This program> will read the given input file(s) and do something useful with the contents thereof. =cut CAVEATS
By default, pod2usage() will use $0 as the path to the pod input file. Unfortunately, not all systems on which Perl runs will set $0 prop- erly (although if $0 isn't found, pod2usage() will search $ENV{PATH} or else the list specified by the "-pathlist" option). If this is the case for your system, you may need to explicitly specify the path to the pod docs for the invoking script using something similar to the following: pod2usage(-exitval => 2, -input => "/path/to/your/pod/docs"); AUTHOR
Brad Appleton <bradapp@enteract.com> Based on code for Pod::Text::pod2text() written by Tom Christiansen <tchrist@mox.perl.com> ACKNOWLEDGEMENTS
Steven McDougall <swmcd@world.std.com> for his help and patience with re-writing this manpage. perl v5.8.0 2002-06-01 Pod::Usage(3pm)
All times are GMT -4. The time now is 09:40 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy