Hello,
Sorry my Broadband was down and I could not check out the perl script. It works beautifully on ASCII data (8-bit). As soon as UTF8 or UTF16 data is addressed, no output is visible.
Does PERL give problems with Unicode?
Since my data is in Perso-Arabic, the script does not work.
Any round-about way to solve the problem. I am using the latest version of ActiveState Perl and in despair even downloaded strawberry perl but the data does not work.
I am attaching the zip file containing data in UTF8 format with Hindi as an example. There are two files testdic and testdic.out
Many thanks for the beautifully commented script. I modified it slightly as under to take input and output from command line:
The rest of the code remains the same.
I do not think this would affect accessing a UTF8 file.
Many thanks once again
Hi - I tried to remove ^M in a delimited file using "tr -d "\r" and "sed 's/^M//g'", but it does not work quite well. While the ^M is removed, the format of the record is still cut in half, like
a,b, c
c,d,e
The delimited file is generated using sh script by outputing a SQL query result to... (7 Replies)
Hi Experts
I am very new to perl and need to make a script using perl.
I would like to remove blanks in a text tab delimited file in in a specfic column range ( colum 21 to column 43) sample input and output shown below :
Input:
117 102 650 652 654 656
117 93 95... (3 Replies)
Hey there - a bit of background on what I'm trying to accomplish, first off. I am trying to load the data from a pipe delimited file into a database. The loading tool that I use cannot handle embedded newline characters within a field, so I need to scrub them out.
Solutions that I have tried... (7 Replies)
I have a large flat file with variable length fields that are pipe delimited. The file has no new line or CR/LF characters to indicate a new record. I need to parse the file and after some number of fields, I need to insert a CR/LF to start the next record.
Input file ... (2 Replies)
Hi All
I wanted to know how to effectively delete some columns in a large tab delimited file.
I have a file that contains 5 columns and almost 100,000 rows
3456 f g t t
3456 g h
456 f h
4567 f g h z
345 f g
567 h j k lThis is a very large data file and tab delimited.
I need... (2 Replies)
Since there are approximately 75K gsfiles and hundreds of stfiles per gsfile, this script can take hours. How can I rewrite this script, so that it's much faster? I'm not as familiar with perl but I'm open to all suggestions.
ls file.list>$split
for gsfile in `cat $split`;
do
csplit... (17 Replies)
Hi,
I have the following command in place
nawk -F, '!a++' file > file.uniq
It has been working perfectly as per requirements, by removing duplicates by taking into consideration only first 3 fields. Recently it has started giving below error:
bash-3.2$ nawk -F, '!a++'... (17 Replies)
I am working on a homonym dictionary of names i.e. names which are clustered together according to their “sound-alike” pronunciation:
An example will make this clear:
Since the dictionary is manually constructed it often happens that inadvertently two sets of “homonyms” which should be grouped... (2 Replies)
I have a file size is around 24 G with 14 columns, delimiter with "|"
My requirement- can anyone provide me the fastest and best to get the below results
Number of records of the file
First column and second Column- Unique counts
Thanks for your time
Karti
------ Post updated at... (3 Replies)
I have a large file 1.5 gb and want to sort the file.
I used the following AWK script to do the job
!x++
The script works but it is very slow and takes over an hour to do the job. I suspect this is because the file is not sorted.
Any solution to speed up the AWk script or a Perl script would... (4 Replies)
Discussion started by: gimley
4 Replies
LEARN ABOUT MOJAVE
tap::parser::sourcehandler::perl
TAP::Parser::SourceHandler::Perl(3pm) Perl Programmers Reference Guide TAP::Parser::SourceHandler::Perl(3pm)NAME
TAP::Parser::SourceHandler::Perl - Stream TAP from a Perl executable
VERSION
Version 3.26
SYNOPSIS
use TAP::Parser::Source;
use TAP::Parser::SourceHandler::Perl;
my $source = TAP::Parser::Source->new->raw( 'script.pl' );
$source->assemble_meta;
my $class = 'TAP::Parser::SourceHandler::Perl';
my $vote = $class->can_handle( $source );
my $iter = $class->make_iterator( $source );
DESCRIPTION
This is a Perl TAP::Parser::SourceHandler - it has 2 jobs:
1. Figure out if the TAP::Parser::Source it's given is actually a Perl script ("can_handle").
2. Creates an iterator for Perl sources ("make_iterator").
Unless you're writing a plugin or subclassing TAP::Parser, you probably won't need to use this module directly.
METHODS
Class Methods
"can_handle"
my $vote = $class->can_handle( $source );
Only votes if $source looks like a file. Casts the following votes:
0.9 if it has a shebang ala "#!...perl"
0.75 if it has any shebang
0.8 if it's a .t file
0.9 if it's a .pl file
0.75 if it's in a 't' directory
0.25 by default (backwards compat)
"make_iterator"
my $iterator = $class->make_iterator( $source );
Constructs & returns a new TAP::Parser::Iterator::Process for the source. Assumes "$source->raw" contains a reference to the perl script.
"croak"s if the file could not be found.
The command to run is built as follows:
$perl @switches $perl_script @test_args
The perl command to use is determined by "get_perl". The command generated is guaranteed to preserve:
PERL5LIB
PERL5OPT
Taint Mode, if set in the script's shebang
Note: the command generated will not respect any shebang line defined in your Perl script. This is only a problem if you have compiled a
custom version of Perl or if you want to use a specific version of Perl for one test and a different version for another, for example:
#!/path/to/a/custom_perl --some --args
#!/usr/local/perl-5.6/bin/perl -w
Currently you need to write a plugin to get around this.
"get_taint"
Decode any taint switches from a Perl shebang line.
# $taint will be 't'
my $taint = TAP::Parser::SourceHandler::Perl->get_taint( '#!/usr/bin/perl -t' );
# $untaint will be undefined
my $untaint = TAP::Parser::SourceHandler::Perl->get_taint( '#!/usr/bin/perl' );
"get_perl"
Gets the version of Perl currently running the test suite.
SUBCLASSING
Please see "SUBCLASSING" in TAP::Parser for a subclassing overview.
Example
package MyPerlSourceHandler;
use strict;
use vars '@ISA';
use TAP::Parser::SourceHandler::Perl;
@ISA = qw( TAP::Parser::SourceHandler::Perl );
# use the version of perl from the shebang line in the test file
sub get_perl {
my $self = shift;
if (my $shebang = $self->shebang( $self->{file} )) {
$shebang =~ /^#!(.*perl.*?)(?:(?:s)|(?:$))/;
return $1 if $1;
}
return $self->SUPER::get_perl(@_);
}
SEE ALSO
TAP::Object, TAP::Parser, TAP::Parser::IteratorFactory, TAP::Parser::SourceHandler, TAP::Parser::SourceHandler::Executable,
TAP::Parser::SourceHandler::File, TAP::Parser::SourceHandler::Handle, TAP::Parser::SourceHandler::RawTAP
perl v5.18.2 2014-01-06 TAP::Parser::SourceHandler::Perl(3pm)