konwert(1) [debian man page]

KONWERT(1)							Linux User's Manual							KONWERT(1)

NAME

       konwert - interface for various character encoding conversions

SYNOPSIS

       konwert FILTER [FILE]... [-o DEST | -O]

DESCRIPTION

       Konwert allows filtering multiple files through multiple filters.  It filters the specified FILEs, or stdin if none are given.

       Simple  FILTER  is  the	name  of an executable file from the directory ~/.konwert/filters or the system-wide one, normally /usr/share/kon-
       wert/filters.  Such program itself filters stdin to stdout.

       The filtering rule can be more complex:

       konwert FILTER1+FILTER2 means konwert FILTER1 | konwert FILTER2.

       konwert FORMAT1-FORMAT2, unless such filter exists, tries to find a common FORMAT3, such that both filters FORMAT1-FORMAT3 and FORMAT3-FOR-
       MAT1 do exist.

       konwert	FILTER/ARG/...	passes	arguments to the filter. Arguments can also be specified here: FORMAT1/ARGS-FORMAT2.  The meaning of argu-
       ments depends on the particular filter.

       konwert '(COMMAND ARGS...)' executes this arbitrary shell command. This is useful with -o or -O options. The  command  cannot  contain  the
       string )+, which will terminate this filter's specification.

   OPTIONS
       -o DEST	 output goes to this file/directory instead of stdout

       -O	 every input file is replaced with its translation

       --help	 display help and exit

       --version output version information and exit

       Redirecting output to one of the source files with either -o or > instead of -O will corrupt it! Option -O creates a temporary file in /tmp
       and later copies it back onto the source.

CHARACTER ENCODING CONVERSIONS

       You can convert text between any two charsets, for example konwert cp437-iso2.

       Characters unavailable in the target charset will be substituted with approximations with available ones. The approximations  need  not	be
       single characters.

       The following character sets are currently supported:

       ascii  7bit ASCII

       utf8 = unicode  Unicode UTF-8

       iso1 = isolatin1
	      ISO-8859-1 aka ISO Latin 1 (Western European)
       iso2 = isolatin2
	      ISO-8859-2 aka ISO Latin 2 (Central European)
       iso3 = isolatin3
	      ISO-8859-3 aka ISO Latin 3 (Esperanto)
       iso4 = isolatin4
	      ISO-8859-4 aka ISO Latin 4 (Baltic)
       iso5 = isolatincyr
	      ISO-8859-5 (Cyrillic)
       iso6 = isolatinarabic
	      ISO-8859-6 (Arabic)
       iso7 = isolatingreek
	      ISO-8859-7 (Greek)
       iso8 = isolatinhebrew
	      ISO-8859-8 (Hebrew)
       iso9 = isolatin5 = isolatintur
	      ISO-8859-9 aka ISO Latin 5 (Turkish)
       iso10 = isolatin6 = isolatinnordic
	      ISO-8859-10 aka ISO Latin 6 (Nordic)
       iso12 = isolatin7 = isolatinceltic
	      ISO-8859-12 aka ISO Latin 6 (Celtic) - Draft
       iso13 = isolatin8 = isolatinbaltic
	      ISO-8859-13 aka ISO Latin 6 (Baltic) - Draft
       iso14 = isolatin9 = isolatinsami
	      ISO-8859-14 aka ISO Latin 6 (Sami) - Draft
       iso15  ISO-8859-15 - Draft

       koi8r	KOI8-R (Russian)
       koi8u	KOI8-U (Ukrainian, Byelorussian)
       koi8uni	KOI8-Uni (Cyrillic)

       cp1250 = wince = winlatin2    Windows CP-1250 aka Win Latin 2 (Central European)
       cp1251 = wincyr		     Windows CP-1251 (Cyrillic)
       cp1252 = winwest = winlatin1  Windows CP-1252 aka Win Latin 1 (Western European)
       cp1253 = wingr		     Windows CP-1253 (Greek)
       cp1254 = wintur		     Windows CP-1254 (Turkish)
       cp1255 = winhebrew	     Windows CP-1255 (Hebrew)
       cp1256 = winarabic	     Windows CP-1256 (Arabic)
       cp1257 = winbaltic	     Windows CP-1257 (Baltic)
       cp1258 = winviet 	     Windows CP-1258 (Vietnamese)

       cp437 = icmeng		    DOS CP-437 (English)
       cp737 = dosgreek 	    DOS CP-737 (Greek)
       cp775 = dosbaltic	    DOS CP-775 (Baltic)
       cp850 = doswest = doslatin1  DOS CP-850 aka DOS Latin 1 (Western European)
       cp852 = dosce = doslatin2    DOS CP-852 aka DOS Latin 2 (Central European)
       cp855 = doscyr		    DOS CP-855 (Cyrillic)
       cp857 = dostur		    DOS CP-857 (Turkish)
       cp860 = dosportugal	    DOS CP-860 (Portugal)
       cp861 = dosiceland	    DOS CP-861 (Icelandic)
       cp862 = doshebrew	    DOS CP-862 (Hebrew)
       cp863 = doscanadfr	    DOS CP-863 (Canadian French)
       cp864 = dosarabic	    DOS CP-864 (Arabic)
       cp865 = dosnordic	    DOS CP-865 (Nordic)
       cp866 = dosrussian	    DOS CP-866 (Russian)
       cp869 = dosgreek2	    DOS CP-869 (Greek2)
       cp874 = dosthai		    DOS CP-874 (Thai)

       mac	   Macintosh Roman (Western European)
       macce	   Macintosh Central European
       maccyr	   Macintosh Cyrillic
       macgreek    Macintosh Greek
       maciceland  Macintosh Icelandic
       mactur	   Macintosh Turkish

       csk,
       cyfromat,
       dhn,
       fidomazovia,
       iea,
       logic,
       mazovia,
       microvex     DOS charsets for Polish

       amigapl,
       fat,
       xjp	Amiga charsets for Polish

       kamenicky  DOS charset for Czech and Slovak

       wingreek  WinGreek (Windows font-based encoding for ancient Greek)

       babelpl	TeX [polish]{babel}: "a"c"e"l"n"o"s"z"r
       ciachy	TeX prefixing: /a/c/e/l/n/o/s/x/z

       xmetodo	      Esperanto: cx gx hx jx sx ux (vx w)
       hmetodo	      Esperanto: ch gh hh jh sh u
       antauxcxap     Esperanto: ^c ^g ^h ^j ^s ^u (~u)
       postcxap       Esperanto: c^ g^ h^ j^ s^ u^ (u~)
       apostrofoj     Esperanto: c' g' h' j' s' u'
       malapostrofoj  Esperanto: c` g` h` j` s` u`

       viscii  VISCII (Vietnamese)
       viqri   Vietnamese Quoted Readable Implicit

       htmldec	SGML/HTML character references (decimal): &#198; &#283; &#8594;
       htmlhex	SGML/HTML character references (hexadecimal): &#xC6; &#x11B; &#x2192;
       htmlent	SGML/HTML character entities (names): &AElig; &ecaron &rarr;
       html	All three above (only as input format)

       tex    TeX  with  some LaTeX or AMS-TeX extensions. There is no distinction between normal and math mode - you will probably have to insert
	      some $'s manually.

       mnemonic   RFC 1345 mnemonics preceded by &
       mnemonic1  RFC 1345 mnemonics preceded by `

       any/LANGUAGE (e.g. any/pl-iso2)
	      This special input format will detect the encoding automatically, basing on the frequencies of characters found in text. Every  lan-
	      guage  is  associated with a set of possible encodings used for it and average frequencies of its letters (excluding ASCII letters).
	      The best fitting encoding is used for conversion. Currently supported  languages	are  cs  (Czech),  de  (German),  el  (Greek),	eo
	      (Esperanto), es (Spanish), fr (French), he (Hebrew), it (Italian), pl (Polish), pt (Portuguese), ru (Russian), and sv (Swedish).

       varpl  Mixed  Polish  ISO-8859-2,  CP-1250,  and UTF-8. If you are reading Polish newsgroups I suggest putting it as a filter in your news-
	      reader (for speed improvement it's better to call it directly, rather than through konwert).

       vareo  Mixed various Esperanto encodings.

OPTIONS CONTROLLING THE ABOVE CONVERSIONS

       /1 (e.g. konwert iso2-ascii/1)
	      Each unavailable character will be replaced only with a single approximate char, not string. This is useful with the filterm program
	      or with preformatted text. This option is automatically turned on when a filter is used as output for filterm.

       /html  Text  is	assumed  to  be  HTML.	The characters " & < > resulting from other characters' approximations will be properly escaped as
	      &quot; &amp; &lt; &gt;.  The <META http-equiv="content-type" content="text/html; charset=..."> header will be fixed if present.

       /htmldec
	      Convert META as above. Unavailable characters will be encoded in &#Unicode;.

       /htmlhex
	      Convert META as above. Unavailable characters will be encoded in hexadecimal &#xUnicode;.

       /tex   Unavailable characters will be described in TeX. Characters # $ % &  ^ _ { | } ~ resulting  from  some  characters'  approximations
	      will be properly escaped into # $ \% & $ackslash$ ^{} \_ { $|$ } ~{}.

       /asciichar
	      Recognizes some ASCII representations of characters, e.g. (c) ... 1/2 >=.

       /rosyjski
	      Russian text will be replaced with its Polish phonetic transcription.

       Some output filters can use the language information for choosing better approximations of unavailable letters, for example /de (German): a
       -> ae instead of a.

OTHER FILTERS

       any/LANGUAGE-test
	      Detects the encoding, but instead of text conversion only shows the encoding's name. The additional option /all shows  all  possible
	      encodings, sorted from better to worse ones.

       cr
       lf
       crlf   Force specific end-of-line marker convention.  cr = Macintosh, lf = Unix and Amiga, crlf = Windows and DOS.  The input convention is
	      detected automatically.

       expand Expands tabs into spaces (uses the textutils program expand).

       unexpand
	      Compresses spaces into tabs (uses the textutils program unexpand).

       rmspacesateol
	      Removes spaces and tabs at end of line.

       qp-8bit
       8bit-qp
	      MIME Quoted Printable encoding: =A3=F3d=BC.

       rtf-8bit
       8bit-rtf
	      Rich Text Format: 'a3'f3d'9f.

       txt-htmlchar
	      Escapes " & < > into SGML/HTML entities &quot; &amp; &lt; &gt;.  Useful for including a text file inside HTML <PRE> </PRE> tags.

       htmlchar-txt
	      Reverse.

       rot13  Guvf vf n qrzbafgengvba bs ebg13.

       toupper
       tolower
	      Self-explanatory. Currently ASCII only.

       prn7pl Converts polish chars to control sequences for EPSON-compatible printer. Using only 7-bit chars, backspacing printer's head and ver-
	      tical  positioning  chars ,.'` it creates pseudo-polish gryphs. You can specify options: /nlq (default) which optimalizes output for
	      better quality printers and /draft - useful for ex. for 9-nails printer.

FILES

       /usr/share/konwert/filters/*
       ~/.konwert/filters/*

SEE ALSO

       trs(1), filterm(1)

BUGS

       APPLE character in mac* charsets, and CH and ch characters in koi8cs are not preserved in conversion even when  they  are  available.  Also
       they don't respect the /1 option. Reason: they are not in Unicode.

COPYRIGHT

       Konwert is a package for conversion between various character encodings.

       Copyright (c) 1998 Marcin 'Qrczak' Kowalczyk

       This  program  is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by
       the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

       This program is distributed in the hope that it will be useful, but WITHOUT ANY	WARRANTY;  without  even  the  implied	warranty  of  MER-
       CHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for more details.

       You  should  have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation,
       Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

AUTHOR

	__("<	Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.home.ml.org/
	\__/	   GCS/M d- s+:-- a21 C+++>+++$ UL++>++++$ P+++ L++>++++$ E->++
	 ^^		   W++ N+++ o? K? w(---) O? M- V? PS-- PE++ Y? PGP->+ t
       QRCZAK		       5? X- R tv-- b+>++ DI D- G+ e>++++ h! r--%>++ y-

Konwert 							    30 Jul 1998 							KONWERT(1)
Linux and UNIX Man Pages

konwert(1) [debian man page]