Unix/Linux Go Back    


CentOS 7.0 - man page for locale::po4a::xml (centos section 3)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)


Locale::Po4a::Xml(3)	       User Contributed Perl Documentation	     Locale::Po4a::Xml(3)

NAME
       Locale::Po4a::Xml - convert XML documents and derivates from/to PO files

DESCRIPTION
       The po4a (PO for anything) project goal is to ease translations (and more interestingly,
       the maintenance of translations) using gettext tools on areas where they were not expected
       like documentation.

       Locale::Po4a::Xml is a module to help the translation of XML documents into other [human]
       languages. It can also be used as a base to build modules for XML-based documents.

TRANSLATING WITH PO4A::XML
       This module can be used directly to handle generic XML documents.  This will extract all
       tag's content, and no attributes, since it's where the text is written in most XML based
       documents.

       There are some options (described in the next section) that can customize this behavior.
       If this doesn't fit to your document format you're encouraged to write your own module
       derived from this, to describe your format's details.  See the section WRITING DERIVATE
       MODULES below, for the process description.

OPTIONS ACCEPTED BY THIS MODULE
       The global debug option causes this module to show the excluded strings, in order to see
       if it skips something important.

       These are this module's particular options:

       nostrip
	   Prevents it to strip the spaces around the extracted strings.

       wrap
	   Canonizes the string to translate, considering that whitespaces are not important, and
	   wraps the translated document. This option can be overridden by custom tag options.
	   See the "tags" option below.

       caseinsensitive
	   It makes the tags and attributes searching to work in a case insensitive way.  If it's
	   defined, it will treat <BooK>laNG and <BOOK>Lang as <book>lang.

       includeexternal
	   When defined, external entities are included in the generated (translated) document,
	   and for the extraction of strings.  If it's not defined, you will have to translate
	   external entities separately as independent documents.

       ontagerror
	   This option defines the behavior of the module when it encounter a invalid XML syntax
	   (a closing tag which does not match the last opening tag, or a tag's attribute without
	   value).  It can take the following values:

	   fail
	       This is the default value.  The module will exit with an error.

	   warn
	       The module will continue, and will issue a warning.

	   silent
	       The module will continue without any warnings.

	   Be careful when using this option.  It is generally recommended to fix the input file.

       tagsonly
	   Extracts only the specified tags in the "tags" option.  Otherwise, it will extract all
	   the tags except the ones specified.

	   Note: This option is deprecated.

       doctype
	   String that will try to match with the first line of the document's doctype (if
	   defined). If it doesn't, a warning will indicate that the document might be of a bad
	   type.

       addlang
	   String indicating the path (e.g. <bbb><aaa>) of a tag where a lang="..." attribute
	   shall be added. The language will be defined as the basename of the PO file without
	   any .po extension.

       tags
	   Space-separated list of tags you want to translate or skip.	By default, the specified
	   tags will be excluded, but if you use the "tagsonly" option, the specified tags will
	   be the only ones included.  The tags must be in the form <aaa>, but you can join some
	   (<bbb><aaa>) to say that the content of the tag <aaa> will only be translated when
	   it's into a <bbb> tag.

	   You can also specify some tag options by putting some characters in front of the tag
	   hierarchy. For example, you can put 'w' (wrap) or 'W' (don't wrap) to override the
	   default behavior specified by the global "wrap" option.

	   Example: W<chapter><title>

	   Note: This option is deprecated.  You should use the translated and untranslated
	   options instead.

       attributes
	   Space-separated list of tag's attributes you want to translate.  You can specify the
	   attributes by their name (for example, "lang"), but you can prefix it with a tag
	   hierarchy, to specify that this attribute will only be translated when it's into the
	   specified tag. For example: <bbb><aaa>lang specifies that the lang attribute will only
	   be translated if it's into an <aaa> tag, and it's into a <bbb> tag.

       foldattributes
	   Do not translate attributes in inline tags.	Instead, replace all attributes of a tag
	   by po4a-id=<id>.

	   This is useful when attributes shall not be translated, as this simplifies the strings
	   for translators, and avoids typos.

       customtag
	   Space-separated list of tags which should not be treated as tags.  These tags are
	   treated as inline, and do not need to be closed.

       break
	   Space-separated list of tags which should break the sequence.  By default, all tags
	   break the sequence.

	   The tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag
	   (<aaa>) should only be considered when it's into another tag (<bbb>).

       inline
	   Space-separated list of tags which should be treated as inline.  By default, all tags
	   break the sequence.

	   The tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag
	   (<aaa>) should only be considered when it's into another tag (<bbb>).

       placeholder
	   Space-separated list of tags which should be treated as placeholders.  Placeholders do
	   not break the sequence, but the content of placeholders is translated separately.

	   The location of the placeholder in its block will be marked with a string similar to:

	     <placeholder type=\"footnote\" id=\"0\"/>

	   The tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag
	   (<aaa>) should only be considered when it's into another tag (<bbb>).

       nodefault
	   Space separated list of tags that the module should not try to set by default in any
	   category.

       cpp Support C preprocessor directives.  When this option is set, po4a will consider
	   preprocessor directives as paragraph separators.  This is important if the XML file
	   must be preprocessed because otherwise the directives may be inserted in the middle of
	   lines if po4a consider it belong to the current paragraph, and they won't be
	   recognized by the preprocessor.  Note: the preprocessor directives must only appear
	   between tags (they must not break a tag).

       translated
	   Space-separated list of tags you want to translate.

	   The tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag
	   (<aaa>) should only be considered when it's into another tag (<bbb>).

	   You can also specify some tag options by putting some characters in front of the tag
	   hierarchy. For example, you can put 'w' (wrap) or 'W' (don't wrap) to override the
	   default behavior specified by the global "wrap" option.

	   Example: W<chapter><title>

       untranslated
	   Space-separated list of tags you do not want to translate.

	   The tags must be in the form <aaa>, but you can join some (<bbb><aaa>), if a tag
	   (<aaa>) should only be considered when it's into another tag (<bbb>).

       defaulttranslateoption
	   The default categories for tags that are not in any of the translated, untranslated,
	   break, inline, or placeholder.

	   This is a set of letters:

	   w   Tags should be translated and content can be re-wrapped.

	   W   Tags should be translated and content should not be re-wrapped.

	   i   Tags should be translated inline.

	   p   Tags should be translated as placeholders.

WRITING DERIVATE MODULES
   DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
       The simplest customization is to define which tags and attributes you want the parser to
       translate.  This should be done in the initialize function.  First you should call the
       main initialize, to get the command-line options, and then, append your custom definitions
       to the options hash.  If you want to treat some new options from command line, you should
       define them before calling the main initialize:

	 $self->{options}{'new_option'}='';
	 $self->SUPER::initialize(%options);
	 $self->{options}{'_default_translated'}.=' <p> <head><title>';
	 $self->{options}{'attributes'}.=' <p>lang id';
	 $self->{options}{'_default_inline'}.=' <br>';
	 $self->treat_options;

       You should use the _default_inline, _default_break, _default_placeholder,
       _default_translated, _default_untranslated, and _default_attributes options in derivated
       modules. This allow users to override the default behavior defined in your module with
       command line options.

   OVERRIDING THE found_string FUNCTION
       Another simple step is to override the function "found_string", which receives the
       extracted strings from the parser, in order to translate them.  There you can control
       which strings you want to translate, and perform transformations to them before or after
       the translation itself.

       It receives the extracted text, the reference on where it was, and a hash that contains
       extra information to control what strings to translate, how to translate them and to
       generate the comment.

       The content of these options depends on the kind of string it is (specified in an entry of
       this hash):

       type="tag"
	   The found string is the content of a translatable tag. The entry "tag_options"
	   contains the option characters in front of the tag hierarchy in the module "tags"
	   option.

       type="attribute"
	   Means that the found string is the value of a translatable attribute. The entry
	   "attribute" has the name of the attribute.

       It must return the text that will replace the original in the translated document. Here's
       a basic example of this function:

	 sub found_string {
	   my ($self,$text,$ref,$options)=@_;
	   $text = $self->translate($text,$ref,"type ".$options->{'type'},
	     'wrap'=>$self->{options}{'wrap'});
	   return $text;
	 }

       There's another simple example in the new Dia module, which only filters some strings.

   MODIFYING TAG TYPES (TODO)
       This is a more complex one, but it enables a (almost) total customization.  It's based in
       a list of hashes, each one defining a tag type's behavior. The list should be sorted so
       that the most general tags are after the most concrete ones (sorted first by the beginning
       and then by the end keys). To define a tag type you'll have to make a hash with the
       following keys:

       beginning
	   Specifies the beginning of the tag, after the "<".

       end Specifies the end of the tag, before the ">".

       breaking
	   It says if this is a breaking tag class.  A non-breaking (inline) tag is one that can
	   be taken as part of the content of another tag.  It can take the values false(0),
	   true(1) or undefined.  If you leave this undefined, you'll have to define the
	   f_breaking function that will say whether a concrete tag of this class is a breaking
	   tag or not.

       f_breaking
	   It's a function that will tell if the next tag is a breaking one or not.  It should be
	   defined if the breaking option is not.

       f_extract
	   If you leave this key undefined, the generic extraction function will have to extract
	   the tag itself.  It's useful for tags that can have other tags or special structures
	   in them, so that the main parser doesn't get mad.  This function receives a boolean
	   that says if the tag should be removed from the input stream or not.

       f_translate
	   This function receives the tag (in the get_string_until() format) and returns the
	   translated tag (translated attributes or all needed transformations) as a single
	   string.

INTERNAL FUNCTIONS used to write derivated parsers
   WORKING WITH TAGS
       get_path()
	   This function returns the path to the current tag from the document's root, in the
	   form <html><body><p>.

	   An additional array of tags (without brackets) can be passed as argument.  These path
	   elements are added to the end of the current path.

       tag_type()
	   This function returns the index from the tag_types list that fits to the next tag in
	   the input stream, or -1 if it's at the end of the input file.

       extract_tag($$)
	   This function returns the next tag from the input stream without the beginning and
	   end, in an array form, to maintain the references from the input file.  It has two
	   parameters: the type of the tag (as returned by tag_type) and a boolean, that
	   indicates if it should be removed from the input stream.

       get_tag_name(@)
	   This function returns the name of the tag passed as an argument, in the array form
	   returned by extract_tag.

       breaking_tag()
	   This function returns a boolean that says if the next tag in the input stream is a
	   breaking tag or not (inline tag).  It leaves the input stream intact.

       treat_tag()
	   This function translates the next tag from the input stream.  Using each tag type's
	   custom translation functions.

       tag_in_list($@)
	   This function returns a string value that says if the first argument (a tag hierarchy)
	   matches any of the tags from the second argument (a list of tags or tag hierarchies).
	   If it doesn't match, it returns 0. Else, it returns the matched tag's options (the
	   characters in front of the tag) or 1 (if that tag doesn't have options).

   WORKING WITH ATTRIBUTES
       treat_attributes(@)
	   This function handles the translation of the tags' attributes. It receives the tag
	   without the beginning / end marks, and then it finds the attributes, and it translates
	   the translatable ones (specified by the module option "attributes").  This returns a
	   plain string with the translated tag.

   WORKING WITH THE MODULE OPTIONS
       treat_options()
	   This function fills the internal structures that contain the tags, attributes and
	   inline data with the options of the module (specified in the command-line or in the
	   initialize function).

   GETTING TEXT FROM THE INPUT DOCUMENT
       get_string_until($%)
	   This function returns an array with the lines (and references) from the input document
	   until it finds the first argument.  The second argument is an options hash. Value 0
	   means disabled (the default) and 1, enabled.

	   The valid options are:

	   include
	       This makes the returned array to contain the searched text

	   remove
	       This removes the returned stream from the input

	   unquoted
	       This ensures that the searched text is outside any quotes

       skip_spaces(\@)
	   This function receives as argument the reference to a paragraph (in the format
	   returned by get_string_until), skips his heading spaces and returns them as a simple
	   string.

       join_lines(@)
	   This function returns a simple string with the text from the argument array
	   (discarding the references).

STATUS OF THIS MODULE
       This module can translate tags and attributes.

TODO LIST
       DOCTYPE (ENTITIES)

       There is a minimal support for the translation of entities. They are translated as a
       whole, and tags are not taken into account. Multilines entities are not supported and
       entities are always rewrapped during the translation.

       MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure inside the $self
       hash?)

SEE ALSO
       Locale::Po4a::TransTractor(3pm), po4a(7)

AUTHORS
	Jordi Vilalta <jvprat@gmail.com>
	Nicolas Francois <nicolas.francois@centraliens.net>

COPYRIGHT AND LICENSE
	Copyright (c) 2004 by Jordi Vilalta  <jvprat@gmail.com>
	Copyright (c) 2008-2009 by Nicolas Francois <nicolas.francois@centraliens.net>

       This program is free software; you may redistribute it and/or modify it under the terms of
       GPL (see the COPYING file).

perl v5.16.3				    2014-06-10			     Locale::Po4a::Xml(3)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums


All times are GMT -4. The time now is 03:02 AM.