Unix/Linux Go Back    

RedHat 9 (Linux i386) - man page for pdl::badvalues (redhat section 1)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)

BADVALUES(1)		       User Contributed Perl Documentation		     BADVALUES(1)

       PDL::BadValues - Discussion of bad value support in PDL

       What are bad values and why should I bother with them?

       Sometimes it's useful to be able to specify a certain value is 'bad' or 'missing'; for
       example CCDs used in astronomy produce 2D images which are not perfect since certain areas
       contain invalid data due to imperfections in the detector.  Whilst PDL's powerful index
       routines and all the complicated business with dataflow, slices, etc etc mean that these
       regions can be ignored in processing, it's awkward to do. It would be much easier to be
       able to say "$c = $a + $b" and leave all the hassle to the computer.

       If you're not interested in this, then you may (rightly) be concerned with how this
       affects the speed of PDL, since the overhead of checking for a bad value at each operation
       can be large.  Because of this, the code has been written to be as fast as possible - par-
       ticularly when operating on piddles which do not contain bad values.  In fact, you should
       notice essentially no speed difference when working with piddles which do not contain bad

       However, if you do not want bad values, then PDL's "WITH_BADVAL" configuration option
       comes to the rescue; if set to 0 or undef, the bad-value support is ignored.  About the
       only time I think you'll need to use this - I admit, I'm biased ;) - is if you have lim-
       ited disk or memory space, since the size of the code is increased (see below).

       You may also ask 'well, my computer supports IEEE NaN, so I already have this'.	Well, yes
       and no - many routines, such as "y=sin(x)", will propogate NaN's without the user having
       to code differently, but routines such as "qsort", or finding the median of an array, need
       to be re-coded to handle bad values.  For floating-point datatypes, "NaN" and "Inf" are
       used to flag bad values IF the option "BADVAL_USENAN" is set to 1 in your config file.
       Otherwise special values are used (Default bad values).	I do not have any benchmarks to
       see which option is faster.

       Code increase due to bad values

       On an i386 machine running linux and perl 5.005_03, I measured the following sizes (the
       Slatec code was compiled in, but none of the other options: eg Karma, FFTW, GSL, and 3d

       WITH_BADVAL = 0
	   Size of blib directory after a successful make = 4963 kb: blib/arch = 2485 kb and
	   blib/lib = 1587 kb.

       WITH_BADVAL = 1
	   Size of blib directory after a successful make = 5723 kb: blib/arch = 3178 kb and
	   blib/lib = 1613 kb.

       So, the overall increase is only 15% - not much to pay for all the wonders that bad values
       provides ;)

       The source code used for this test had the vast majority of the core routines (eg those in
       Basic/) converted to use bad values, whilst very few of the 'external' routines (ie every-
       thing else in the PDL distribution) had been changed.

       A quick overview

	perldl> p $PDL::Bad::Status
	perldl> $a = sequence(4,3);
	perldl> p $a
	 [ 0  1  2  3]
	 [ 4  5  6  7]
	 [ 8  9 10 11]
	perldl> $a = $a->setbadif( $a % 3 == 2 )
	perldl> p $a
	 [  0	1 BAD	3]
	 [  4 BAD   6	7]
	 [BAD	9  10 BAD]
	perldl> $a *= 3
	perldl> p $a
	 [  0	3 BAD	9]
	 [ 12 BAD  18  21]
	 [BAD  27  30 BAD]
	perldl> p $a->sum

       "demo bad" and "demo bad2" within perldl gives a demonstration of some of the things pos-
       sible with bad values.  These are also available on PDL's web-site, at
       http://pdl.perl.org/demos/.  See PDL::Bad for useful routines for working with bad values
       and t/bad.t to see them in action.

       The intention is to:

       o   not significantly affect PDL for users who don't need bad value support

       o   be as fast as possible when bad value support is installed

       If you never want bad value support, then you set "WITH_BADVAL" to 0 in perldl.conf; PDL
       then has no bad value support compiled in, so will be as fast as it used to be.

       However, in most cases, the bad value support has a negligible affect on speed, so you
       should set "WITH_CONFIG" to 1! One exception is if you are low on memory, since the amount
       of code produced is larger (but only by about 15% - see "Code increase due to bad val-

       To find out if PDL has been compiled with bad value support, look at the values of either
       $PDL::Config{WITH_BADVAL} or $PDL::Bad::Status - if true then it has been.

       To find out if a routine supports bad values, use the "badinfo" command in perldl or the
       "-b" option to pdldoc.  This facility is currently a 'proof of concept' (or, more realis-
       tically, a quick hack) so expect it to be rough around the edges.

       Each piddle contains a flag - accessible via "$pdl->badflag" - to say whether there's any
       bad data present:

       o   If false/0, which means there's no bad data here, the code supplied by the "Code"
	   option to "pp_def()" is executed. This means that the speed should be very close to
	   that obtained with "WITH_BADVAL=0", since the only overhead is several accesses to a
	   bit in the piddles state variable.

       o   If true/1, then this says there MAY be bad data in the piddle, so use the code in the
	   "BadCode" option (assuming that the "pp_def()" for this routine has been updated to
	   have a BadCode key).  You get all the advantages of threading, as with the "Code"
	   option, but it will run slower since you are going to have to handle the presence of
	   bad values.

       If you create a piddle, it will have its bad-value flag set to 0. To change this, use
       "$pdl->badflag($new_bad_status)", where $new_bad_status can be 0 or 1.  When a routine
       creates a piddle, it's bad-value flag will depend on the input piddles: unless over-ridden
       (see the "CopyBadStatusCode" option to "pp_def"), the bad-value flag will be set true if
       any of the input piddles contain bad values.  To check that a piddle really contains bad
       data, use the "check_badflag" method.

       NOTE: propogation of the badflag

       If you change the badflag of a piddle, this change is propogated to all the children of a
       piddle, so

	  perldl> $a = zeroes(20,30);
	  perldl> $b = $a->slice('0:10,0:10');
	  perldl> $c = $b->slice(',(2)');
	  perldl> print ">>c: ", $c->badflag, "\n";
	  >>c: 0
	  perldl> $a->badflag(1);
	  perldl> print ">>c: ", $c->badflag, "\n";
	  >>c: 1

       No change is made to the parents of a piddle, so

	  perldl> print ">>a: ", $a->badflag, "\n";
	  >>a: 1
	  perldl> $c->badflag(0);
	  perldl> print ">>a: ", $a->badflag, "\n";
	  >>a: 1


       o   the badflag can ONLY be cleared IF a piddle has NO parents, and that this change will
	   propogate to all the children of that piddle. I am not so keen on this anymore (too
	   awkward to code, for one).

       o   "$a->badflag(1)" should propogate the badflag to BOTH parents and children.

       This shouldn't be hard to implement (although an initial attempt failed!).  Does it make
       sense though? There's also the issue of what happens if you change the badvalue of a pid-
       dle - should these propogate to children/parents (yes) or whether you should only be able
       to change the badvalue at the 'top' level - ie those piddles which do not have parents.

       The "orig_badvalue()" method returns the compile-time value for a given datatype. It works
       on piddles, PDL::Type objects, and numbers - eg

	 $pdl->orig_badvalue(), byte->orig_badvalue(), and orig_badvalue(4).

       It also has a horrible name...

       To get the current bad value, use the "badvalue()" method - it has the same syntax as

       To change the current bad value, supply the new number to badvalue - eg

	 $pdl->badvalue(2.3), byte->badvalue(2), badvalue(5,-3e34).

       Note: the value is silently converted to the correct C type, and returned - ie "byte->bad-
       value(-26)" returns 230 on my linux machine.  It is also a "nop" for floating-point types
       when "BADVAL_USENAN" is true.

       Note that changes to the bad value are NOT propogated to previously-created piddles - they
       will still have the bad value set, but suddenly the elements that were bad will become
       'good', but containing the old bad value.  See discussion below.  It's not a problem for
       floating-point types, since you can't change their badvalue.

       Bad values and boolean operators

       For those boolean operators in PDL::Ops, evaluation on a bad value returns the bad value.
       Whilst this means that

	$mask = $img > $thresh;

       correctly propogates bad values, it will cause problems for checks such as

	do_something() if any( $img > $thresh );

       which need to be re-written as something like

	do_something() if any( setbadtoval( ($img > $thresh), 0 ) );

       When using one of the 'projection' functions in PDL::Ufunc - such as orover - bad values
       are skipped over (see the documentation of these functions for the current (poor) handling
       of the case when all elements are bad).

       A bad value for each piddle, and related issues

       The following is relevant only for integer types, where there is a choice of value to use
       as the bad flag.

       Currently, there is one bad value for each datatype. The code is written so that we could
       have a separate bad value for each piddle (stored in the pdl structure) - this would then
       remove the current problem of:

	perldl> $a = byte( 1, 2, byte->badvalue, 4, 5 );
	perldl> p $a;
	[1 2 255 4 5]
	perldl> $a->badflag(1)
	perldl> p $a;
	[1 2 BAD 4 5]
	perldl> byte->badvalue(0);
	perldl> p $a;
	[1 2 255 4 5]

       ie the bad value in $a has lost its bad status using the current implementation.  It would
       almost certainly cause problems elsewhere though!

       During a "perl Makefile.PL", the file Basic/Core/badsupport.p is created; this file con-
       tains the values of the "WITH_BADVAL" and "BADVAL_USENAN" variables, and should be used by
       code that is executed before the PDL::Config file is created (e.g. Basic/Core/pdl-
       core.c.PL.  However, most PDL code will just need to access the %PDL::Config array (e.g.
       Basic/Bad/bad.pd) to find out whether bad-value support is required.

       A new flag has been added to the state of a piddle - "PDL_BADVAL". If unset, then the pid-
       dle does not contain bad values, and so all the support code can be ignored. If set, it
       does not guarantee that bad values are present, just that they should be checked for.
       Thanks to Christian, "badflag()" - which sets/clears this flag (see Basic/Bad/bad.pd) -
       will update ALL the children/grandchildren/etc of a piddle if its state changes (see "bad-
       flag" in Basic/Bad/bad.pd and "propogate_badflag" in Basic/Core/Core.xs.PL).  It's not
       clear what to do with parents: I can see the reason for propogating a 'set badflag'
       request to parents, but I think a child should NOT be able to clear the badflag of a par-
       ent.  There's also the issue of what happens when you change the bad value for a piddle.

       The "pdl_trans" structure has been extended to include an integer value, "bvalflag", which
       acts as a switch to tell the code whether to handle bad values or not. This value is set
       if any of the input piddles have their "PDL_BADVAL" flag set (although this code can be
       replaced by setting "FindBadStateCode" in pp_def).  The logic of the check is going to get
       a tad more complicated if I allow routines to fall back to using the "Code" section for
       floating-point types (ie those routines with "NoBadifNaN => 1" when "BADVAL_USENAN" is

       The bad values for the integer types are now stored in a structure within the Core PDL
       structure - "PDL.bvals" (eg Basic/Core/pdlcore.h.PL); see also "typedef badvals" in
       Basic/Core/pdl.h.PL and the BOOT code of Basic/Core/Core.xs.PL where the values are ini-
       tialised to (hopefully) sensible values.  See PDL/Bad/bad.pd for read/write routines to
       the values.

       All this means that the internals of PDL are not binary compatible with PDL 2.1.1 and ear-
       lier; external modules will need to be recompiled.

       Why not make a PDL subclass?

       The support for bad values could have been done as a PDL sub-class.  The advantage of this
       approach would be that you only load in the code to handle bad values if you actually want
       to use them.  The downside is that the code then gets separated: any bug fixes/improve-
       ments have to be done to the code in two different files.  With the present approach the
       code is in the same "pp_def" function (although there is still the problem that both
       "Code" and "BadCode" sections need updating).

       Default bad values

       The default/original bad values are set to (taken from the Starlink distribution):

	 #include <limits.h>

	 PDL_Byte    ==  UCHAR_MAX
	 PDL_Short   ==   SHRT_MIN
	 PDL_Ushort  ==  USHRT_MAX
	 PDL_Long    ==    INT_MIN

       If "BADVAL_USENAN == 0", then we also have

	 PDL_Float   ==   -FLT_MAX
	 PDL_Double  ==   -DBL_MAX

       otherwise all of "NaN", "+Inf", and "-Inf" are taken to be bad for floating-point types.
       In this case, the bad value can't be changed, unlike the integer types.

       How do I change a routine to handle bad values?

       Examples can be found in most of the *.pd files in Basic/ (and hopefully many more places
       soon!).	Some of the logic might appear a bit unclear - that's probably because it is!
       Comments appreciated.

       All routines should automatically propogate the bad status flag to output piddles, unless
       you declare otherwise.

       If a routine explicitly deals with bad values, you must provide this option to pp_def:

	  HandleBad => 1

       This ensures that the correct variables are initialised for the $ISBAD etc macros. It is
       also used by the automatic document-creation routines to provide default information on
       the bad value support of a routine without the user having to type it themselves (this is
       in its early stages).

       To flag a routine as NOT handling bad values, use

	  HandleBad => 0

       This should cause the routine to print a warning if it's sent any piddles with the bad
       flag set. Primitive's "intover" has had this set - since it would be awkward to convert -
       but I've not tried it out to see if it works.

       If you want to handle bad values but not set the state of all the output piddles, or if
       it's only one input piddle that's important, then look at the PP rules "NewXSFindBadSta-
       tus" and "NewXSCopyBadStatus" and the corresponding "pp_def" options:

	   By default, "FindBadStatusCode" creates code which sets "__privtrans->bvalflag"
	   depending on the state of the bad flag of the input piddles: see "findbadstatus" in

	   The default code here is a bit simpler than for "FindBadStatusCode": the bad flag of
	   the output piddles are set if "__privtrans->bvalflag" is true after the code has been
	   evaluated.  Sometimes "CopyBadStatusCode" is set to an empty string, with the respon-
	   sibility of setting the badflag of the output piddle left to the "BadCode" section
	   (e.g. the "xxxover" routines in Basic/Primitive/primitive.pd).

       If you have a routine that you want to be able to use as inplace, look at the routines in
       bad.pd (or ops.pd) which use the "Inplace" option to see how the bad flag is propogated to
       children using the "xxxBadStatusCode" options.  I decided not to automate this as rules
       would be a little complex, since not every inplace op will need to propogate the badflag
       (eg unary functions).

       If the option

	  HandleBad => 1

       is given, then many things happen.  For integer types, the readdata code automatically
       creates a variable called "<pdl name>_badval", which contains the bad value for that pid-
       dle (see "get_xsdatapdecl()" in Basic/Gen/PP/PdlParObjs.pm).  However, do not hard code
       this name into your code!  Instead use macros (thanks to Tuomas for the suggestion):

	 '$ISBAD(a(n=>1))'  expands to '$a(n=>1) == a_badval'
	 '$ISGOOD(a())' 	       '$a()	 != a_badval'
	 '$SETBAD(bob())'	       '$bob()	  = bob_badval'

       well, the "$a(...)" is expanded as well. Also, you can use a "$" before the pdl name, if
       you so wish, but it begins to look like line noise - eg "$ISGOOD($a())".

       If you cache a piddle value in a variable -- eg "index" in slices.pd -- the following rou-
       tines are useful:

	  '$ISBADVAR(c_var,pdl)'       'c_var == pdl_badval'
	  '$ISGOODVAR(c_var,pdl)'      'c_var != pdl_badval'
	  '$SETBADVAR(c_var,pdl)'      'c_var  = pdl_badval'

       The following have been introduced, They may need playing around with to improve their

	 '$PPISBAD(CHILD,[i])	       'CHILD_physdatap[i] == CHILD_badval'
	 '$PPISGOOD(CHILD,[i])	       'CHILD_physdatap[i] != CHILD_badval'
	 '$PPSETBAD(CHILD,[i])	       'CHILD_physdatap[i]  = CHILD_badval'

       If "BADVAL_USENAN" is set, then it's a bit different for "float" and "double", where we
       consider "NaN", "+Inf", and "-Inf" all to be bad. In this case:

	 ISBAD	 becomes   finite(piddle) == 0
	 ISGOOD 	   finite(piddle) != 0
	 SETBAD 	   piddle	   = NaN

       where the value for NaN is discussed below in Handling NaN values.

       This all means that you can change

	  Code => '$a() = $b() + $c();'


	  BadCode => 'if ( $ISBAD(b()) || $ISBAD(c()) ) {
		      } else {
			$a() = $b() + $c();

       leaving Code as it is. PP::PDLCode will then create a loop something like

	  if ( __trans->bvalflag ) {
	       threadloop over BadCode
	  } else {
	       threadloop over Code

       (it's probably easier to just look at the .xs file to see what goes on).

       Going beyond the Code section

       Similar to "BadCode", there's "BadBackCode", and "BadRedoDimsCode".

       Handling "EquivCPOffsCode" is a bit different: under the assumption that the only access
       to data is via the "$EQUIVCPOFFS(i,j)" macro, then we can automatically create the 'bad'
       version of it; see the "[EquivCPOffsCode]" and "[Code]" rules in PDL::PP.

       Macro access to the bad flag of a piddle

       Macros have been provided to provide access to the bad-flag status of a pdl:

	 '$PDLSTATEISBAD(a)'	-> '($PDL(a)->state & PDL_BADVAL) > 0'
	 '$PDLSTATEISGOOD(a)'	   '($PDL(a)->state & PDL_BADVAL) == 0'

	 '$PDLSTATESETBAD(a)'	   '$PDL(a)->state |= PDL_BADVAL'
	 '$PDLSTATESETGOOD(a)'	   '$PDL(a)->state &= ~PDL_BADVAL'

       For use in "xxxxBadStatusCode" (+ other stuff that goes into the INIT: section) there are:

	 '$SETPDLSTATEBAD(a)'	    -> 'a->state |= PDL_BADVAL'
	 '$SETPDLSTATEGOOD(a)'	    -> 'a->state &= ~PDL_BADVAL'

	 '$ISPDLSTATEBAD(a)'	    -> '((a->state & PDL_BADVAL) > 0)'
	 '$ISPDLSTATEGOOD(a)'	    -> '((a->state & PDL_BADVAL) == 0)'

       Handling NaN values

       There are two issues:

       NaN as the bad value
	   which is done.  To select, set "BADVAL_USENAN" to 1 in perldl.conf; a value of 0 falls
	   back to treating the floating-point types the same as the integers.	I need to do some
	   benchmarks to see which is faster, and whether it's dependent on machines (Linux seems
	   to slow down much more than my sparc machine in some very simple tests I did).

       Ignoring BadCode sections
	   which is not.

       For simple routines processing floating-point numbers, we should let the computer process
       the bad values (ie "NaN" and "Inf" values) instead of using the code in the "BadCode" sec-
       tion.  Many such routines have been labelled using "NoBadifNaN => 1"; however this is cur-
       rently ignored by PDL::PP.

       For these routines, we want to use the "Code" section if

	 the piddle does not have its bad flag set
	 the datatype is a float or double

       otherwise we use the "BadCode" section.	This is NOT IMPLEMENTED, as it will require rea-
       sonable hacking of PP::PDLCode!

       There's also the problem of how we handle 'exceptions' - since "$a = pdl(2) / pdl(0)" pro-
       duces a bad value but doesn't update the badflag value of the piddle.  Can we catch an
       exception, or do we have to trap for this (e.g. search for "exception" in

       Checking for "Nan", and "Inf" is done by using the "finite()" system call.  If you want to
       set a value to the "NaN" value, the following bit of code can be used (this can be found
       in both Basic/Core/Core.xs.PL and Basic/Bad/bad.pd):

	 /* for big-endian machines */
	 static union { unsigned char __c[4]; float __d; }
	       __pdl_nan = { { 0x7f, 0xc0, 0, 0 } };

	 /* for little-endian machines */
	 static union { unsigned char __c[4]; float __d; }
	       __pdl_nan = { { 0, 0, 0xc0, 0x7f } };

       To find out whether a particular machine is big endian, use the routine

       One of the strengths of PDL is it's on-line documentation. The aim is to use this system
       to provide informtion on how/if a routine supports bad values: in many cases "pp_def()"
       contains all the information anyway, so the function-writer doesn't need to do anything at
       all! For the cases when this is not sufficient, there's the "BadDoc" option. For code
       written at the perl level - ie in a .pm file - use the "=for bad" pod directive.

       This information will be available via man/pod2man/html documenation. It's also accessible
       from the "perldl" shell - using the "badinfo" command - and the "pdldoc" shell command -
       using the "-b" option.

       This support is at a very early stage - ie not much thought has gone into it: comments are
       welcome; improvements to the code preferred ;) One awkward problem is for *.pm code: you
       have to write a *.pm.PL file which only inserts the "=for bad" directive (+ text) if bad
       value support is compiled in. In fact, this is a pain when handling bad values at the
       perl, rather than PDL::PP, level: perhaps I should just scrap the "WITH_BADVAL" option...

       There are a number of areas that need work, user input, or both!  They are mentioned else-
       where in this document, but this is just to make sure they don't get lost.

       Trapping invalid mathematical operations

       Should we add exceptions to the functions in "PDL::Ops" to set the output bad for out-of-
       range input values?

	perldl> p log10(pdl(10,100,-1))

       I would like the above to produce "[1 2 BAD]", but this would slow down operations on all
       piddles.  We could check for "NaN"/"Inf" values after the operation, but I doubt that
       would be any faster.

       Integration with NaN

       When "BADVAL_USENAN" is true, the routines in "PDL::Ops" should just fall through to the
       "Code" section - ie don't use "BadCode" - for "float" and "double" data types.

       Global versus per-piddle bad values

       I think all that's needed is to change the routines in "Basic/Core/pdlconv.c.PL", although
       there's bound to be complications.  It would also mean that the pdl structure would need
       to have a variable to store its bad value, which would mean binary incompatability with
       previous versions of PDL with bad value support.

       Dataflow of the badflag

       Currently changes to the bad flag are propogated to the children of a piddle, but perhaps
       they should also be passed on to the parents as well.

       The build process has been affected. The following files are now created during the build:

	 Basic/Core/pdlcore.h	   pdlcore.h.PL
		    pdlcore.c	   pdlcore.c.PL
		    pdlapi.c	   pdlapi.c.PL
		    Core.xs	   Core.xs.PL
		    Core.pm	   Core.pm.PL

       Several new files have been added:

	 Basic/Pod/Badvalues.pod (ie this file)





       o   Look at using per-piddle bad values.  Would mean a change to the pdl structure (ie
	   binary incompatability) and the routines in "Basic/Core/pdlconv.c.PL" would need
	   changing to handle this.  Most other routines should not need to be changed ...

       o   what to do about "$b = pdl(-2); $a = log10($b)" - $a should be set bad, but it cur-
	   rently isn't.

       o   Allow the operations in PDL::Ops to skip the check for bad values when using NaN as a
	   bad value and processing a floating-point piddle.  Needs a fair bit of work to

       o   "$pdl->baddata()" now updates all the children of this piddle as well. However, not
	   sure what to do with parents, since:

	     $b = $a->slice();

	   doesn't mean that $a shouldn't have it's badvalue cleared.  however, after


	   it's sensible to assume that the parents now get flagged as containing bad values.

	   PERHAPS you can only clear the bad value flag if you are NOT a child of another pid-
	   dle, whereas if you set the flag then all children AND parents should be set as well?

	   Similarly, if you change the bad value in a piddle, should this be propogated to par-
	   ent & children? Or should you only be able to do this on the 'top-level' piddle?

       o   get some code set up to do benchmarks to see how much things are slowed down (and to
	   check that I haven't messed things up if "WITH_BADVAL" is 0/undef).

       o   some of the names aren't appealing - I'm thinking of "orig_badvalue()" in
	   Basic/Bad/bad.pd in particular. Any suggestions appreciated.

       Copyright (C) Doug Burke (burke@ifa.hawaii.edu), 2000.  Commercial reproduction of this
       documentation in a different format is forbidden.

perl v5.8.0				    2000-11-20				     BADVALUES(1)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums

All times are GMT -4. The time now is 06:47 AM.