Removing dupes within 2 delimited areas in a large dictionary file Post: 302740867

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Issue with Removing Carriage Return (^M) in delimited file

Hi - I tried to remove ^M in a delimited file using "tr -d "\r" and "sed 's/^M//g'", but it does not work quite well. While the ^M is removed, the format of the record is still cut in half, like a,b, c c,d,e The delimited file is generated using sh script by outputing a SQL query result to...

2. Shell Programming and Scripting

Removing blanks in a text tab delimited file

Hi Experts I am very new to perl and need to make a script using perl. I would like to remove blanks in a text tab delimited file in in a specfic column range ( colum 21 to column 43) sample input and output shown below : Input: 117 102 650 652 654 656 117 93 95...

3. Shell Programming and Scripting

Removing Embedded Newline from Delimited File

Hey there - a bit of background on what I'm trying to accomplish, first off. I am trying to load the data from a pipe delimited file into a database. The loading tool that I use cannot handle embedded newline characters within a field, so I need to scrub them out. Solutions that I have tried...

4. Shell Programming and Scripting

Large pipe delimited file that I need to add CR/LF every n fields

I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record. Input file ...

5. Shell Programming and Scripting

Extracting a portion of data from a very large tab delimited text file

Hi All I wanted to know how to effectively delete some columns in a large tab delimited file. I have a file that contains 5 columns and almost 100,000 rows 3456 f g t t 3456 g h 456 f h 4567 f g h z 345 f g 567 h j k lThis is a very large data file and tab delimited. I need...

6. Shell Programming and Scripting

Script Optimization - large delimited file, for loop with many greps

Since there are approximately 75K gsfiles and hundreds of stfiles per gsfile, this script can take hours. How can I rewrite this script, so that it's much faster? I'm not as familiar with perl but I'm open to all suggestions. ls file.list>$split for gsfile in `cat $split`; do csplit...

7. Shell Programming and Scripting

Removing Dupes from huge file- awk/perl/uniq

Hi, I have the following command in place nawk -F, '!a++' file > file.uniq It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error: bash-3.2$ nawk -F, '!a++'...

8. Shell Programming and Scripting

Merging dupes on different lines in a dictionary

I am working on a homonym dictionary of names i.e. names which are clustered together according to their �sound-alike� pronunciation: An example will make this clear: Since the dictionary is manually constructed it often happens that inadvertently two sets of �homonyms� which should be grouped...

9. UNIX for Advanced & Expert Users

Need optimized awk/perl/shell to give the statistics for the Large delimited file

I have a file size is around 24 G with 14 columns, delimiter with "|" My requirement- can anyone provide me the fastest and best to get the below results Number of records of the file First column and second Column- Unique counts Thanks for your time Karti ------ Post updated at...

10. Shell Programming and Scripting

Remove dupes in a large file

I have a large file 1.5 gb and want to sort the file. I used the following AWK script to do the job !x++ The script works but it is very slow and takes over an hour to do the job. I suspect this is because the file is not sorted. Any solution to speed up the AWk script or a Perl script would...

LEARN ABOUT MOJAVE

tap::parser::sourcehandler::perl

TAP::Parser::SourceHandler::Perl(3pm)			 Perl Programmers Reference Guide		     TAP::Parser::SourceHandler::Perl(3pm)

NAME

       TAP::Parser::SourceHandler::Perl - Stream TAP from a Perl executable

VERSION

       Version 3.26

SYNOPSIS

	 use TAP::Parser::Source;
	 use TAP::Parser::SourceHandler::Perl;

	 my $source = TAP::Parser::Source->new->raw( 'script.pl' );
	 $source->assemble_meta;

	 my $class = 'TAP::Parser::SourceHandler::Perl';
	 my $vote  = $class->can_handle( $source );
	 my $iter  = $class->make_iterator( $source );

DESCRIPTION

       This is a Perl TAP::Parser::SourceHandler - it has 2 jobs:

       1. Figure out if the TAP::Parser::Source it's given is actually a Perl script ("can_handle").

       2. Creates an iterator for Perl sources ("make_iterator").

       Unless you're writing a plugin or subclassing TAP::Parser, you probably won't need to use this module directly.

METHODS

   Class Methods
       "can_handle"

	 my $vote = $class->can_handle( $source );

       Only votes if $source looks like a file.  Casts the following votes:

	 0.9  if it has a shebang ala "#!...perl"
	 0.75 if it has any shebang
	 0.8  if it's a .t file
	 0.9  if it's a .pl file
	 0.75 if it's in a 't' directory
	 0.25 by default (backwards compat)

       "make_iterator"

	 my $iterator = $class->make_iterator( $source );

       Constructs & returns a new TAP::Parser::Iterator::Process for the source.  Assumes "$source->raw" contains a reference to the perl script.
       "croak"s if the file could not be found.

       The command to run is built as follows:

	 $perl @switches $perl_script @test_args

       The perl command to use is determined by "get_perl".  The command generated is guaranteed to preserve:

	 PERL5LIB
	 PERL5OPT
	 Taint Mode, if set in the script's shebang

       Note: the command generated will not respect any shebang line defined in your Perl script.  This is only a problem if you have compiled a
       custom version of Perl or if you want to use a specific version of Perl for one test and a different version for another, for example:

	 #!/path/to/a/custom_perl --some --args
	 #!/usr/local/perl-5.6/bin/perl -w

       Currently you need to write a plugin to get around this.

       "get_taint"

       Decode any taint switches from a Perl shebang line.

	 # $taint will be 't'
	 my $taint = TAP::Parser::SourceHandler::Perl->get_taint( '#!/usr/bin/perl -t' );

	 # $untaint will be undefined
	 my $untaint = TAP::Parser::SourceHandler::Perl->get_taint( '#!/usr/bin/perl' );

       "get_perl"

       Gets the version of Perl currently running the test suite.

SUBCLASSING

       Please see "SUBCLASSING" in TAP::Parser for a subclassing overview.

   Example
	 package MyPerlSourceHandler;

	 use strict;
	 use vars '@ISA';

	 use TAP::Parser::SourceHandler::Perl;

	 @ISA = qw( TAP::Parser::SourceHandler::Perl );

	 # use the version of perl from the shebang line in the test file
	 sub get_perl {
	     my $self = shift;
	     if (my $shebang = $self->shebang( $self->{file} )) {
		 $shebang =~ /^#!(.*perl.*?)(?:(?:s)|(?:$))/;
		 return $1 if $1;
	     }
	     return $self->SUPER::get_perl(@_);
	 }

SEE ALSO

       TAP::Object, TAP::Parser, TAP::Parser::IteratorFactory, TAP::Parser::SourceHandler, TAP::Parser::SourceHandler::Executable,
       TAP::Parser::SourceHandler::File, TAP::Parser::SourceHandler::Handle, TAP::Parser::SourceHandler::RawTAP

perl v5.18.2							    2014-01-06				     TAP::Parser::SourceHandler::Perl(3pm)