omindex(1) [debian man page]
OMINDEX(1) User Commands OMINDEX(1) NAME
omindex - Index static website data via the filesystem SYNOPSIS
omindex [OPTIONS] --db DATABASE [BASEDIR] DIRECTORY DESCRIPTION
omindex - Index static website data via the filesystem DIRECTORY is the directory to start indexing from. BASEDIR is the directory corresponding to URL (default: DIRECTORY). OPTIONS
-d, --duplicates set duplicate handling ('ignore' or 'replace') -p, --no-delete skip the deletion of documents corresponding to deleted files (--preserve-nonduplicates is a deprecated alias for --no-delete) -e, --empty-docs=ARG how to handle documents we extract no text from: ARG can be index, warn (issue a diagnostic and index), or skip. (default: warn) -D, --db=DATABASE path to database to use -U, --url=URL base url BASEDIR corresponds to (default: /) -M, --mime-type=EXT:TYPE map file extension EXT to MIME Content-Type TYPE (empty TYPE removes any MIME mapping for EXT) -F, --filter=TYPE:CMD process files with MIME Content-Type TYPE using command CMD, which should produce UTF-8 text on stdout e.g. -Fapplica- tion/octet-stream:'strings -n8' -l, --depth-limit=LIMIT set recursion limit (0 = unlimited) -f, --follow follow symbolic links -i, --ignore-exclusions ignore meta robots tags and similar exclusions -S, --spelling index data for spelling correction -m, --max-size maximum size of file to index (in bytes or with a suffix of 'K'/'k', 'M'/'m', 'G'/'g') -E, --sample-size=SIZE sets the maximum number of bytes for the document text sample. (default SIZE = 512) -v, --verbose show more information about what is happening --overwrite create the database anew (the default is to update if the database already exists) -s, --stemmer=LANG set the stemming language, the default is 'english'. Possible values: danish dutch english finnish french german german2 hungarian italian kraaij_pohlmann lovins norwegian porter portuguese romanian russian spanish swedish turkish (pass 'none' to disable stem- ming) -h, --help display this help and exit -V, --version output version information and exit Please report bugs at: http://xapian.org/bugs xapian-omega 1.2.12 June 2012 OMINDEX(1)
Check Out this Related Man Page
xindy(1) xindy xindy(1) NAME
xindy - create sorted and tagged index from raw index SYNOPSIS
xindy [-V?h] [-qv] [-d magic] [-o outfile.ind] [-t log] [-L lang] [-C codepage] [-M module] [-I input] [--interactive] [--mem-file=xindy.mem] [idx0 idx1 ...] GNU-Style Long Options for Short Options: -V / --version -? / -h / --help -q / --quiet -v / --verbose -d / --debug (multiple times) -o / --out-file -t / --log-file -L / --language -C / --codepage -M / --module (multiple times) -I / --input-markup (supported: latex, omega, xindy) DESCRIPTION
xindy is the formatter-indepedent command of xindy, the flexible indexing system. It takes a raw index as input, and produces a merged, sorted and tagged index. Merging, sorting, and tagging is controlled by xindy style files. Files with the raw index are passed as arguments. If no arguments are passed, the raw index will be read from standard input. xindy is completely described in its manual that you will find on its Web Site, http://www.xindy.org/. A good introductionary description appears in the indexing chapter of the LaTeX Companion (2nd ed.) If you want to produce an index for LaTeX documents, the command texindy(1) is probably more of interest for you. It is a wrapper for xindy that turns on many LaTeX conventions by default. OPTIONS
"--version" / -V output version numbers of all relevant components and exit. "--help" / -h / -? output usage message with options explanation. "--quiet" / -q Don't output progress messages. Output only error messages. "--verbose" / -v Output verbose progress messages. "--debug" magic / -d magic Output debug messages, this option may be specified multiple times. magic determines what is output: magic remark ------------------------------------------------------------ script internal progress messages of driver scripts keep_tmpfiles don't discard temporary files markup output markup trace, as explained in xindy manual level=n log level, n is 0 (default), 1, 2, or 3 "--out-file" outfile.ind / -o outfile.ind Output index to file outfile.ind. If this option is not passed, the name of the output file is the base name of the first argument and the file extension ind. If the raw index is read from standard input, this option is mandatory. "--log-file" log.ilg / -t log.ilg Output log messages to file log.ilg. These log messages are independent from the progress messages that you can influence with "--debug" or "--verbose". "--language" lang / -L lang The index is sorted according to the rules of language lang. These rules are encoded in a xindy module created by make-rules. If no input encoding is specified via "--codepage", a xindy module for that language is searched with a latin, a cp, an iso, or ascii encoding, in that order. "--codepage" enc / -C enc The raw input is in input encoding enc. This information is used to select the correct xindy sort module and also the inputenc target encoding for "latex" input markup. When "omega" input markup is used, "utf8" is always used as codepage, this option is then ignored. "--module" module / -M module Load the xindy module module.xdy. This option may be specified multiple times. The modules are searched in the xindy search path that can be changed with the environment variable "XINDY_SEARCHPATH". "--input-markup" input / -I input Specifies the input markup of the raw index. Supported values for input are "latex", "omega", and "xindy". "latex" input markup is the one that is emitted by default from the LaTeX kernel, or by the "index" macro package of David Jones. ^^-notation of single byte characters is supported. Usage of LaTeX's inputenc package is assumed as well. "omega" input markup is like "latex" input markup, but with Omega's ^^-notation as encoding for non-ASCII characters. LaTeX inputenc encoding is not used then, and "utf8" is enforced to be the codepage. "xindy" input markup is specified in the xindy manual. "--interactive" Start xindy in interactive mode. You will be in a xindy read-eval-loop where xindy language expressions are read and evaluated interactively. "--mem-file" xindy.mem This option is only usable for developers or in very rare situations. The compiled xindy kernel is stored in a so-called memory file, canonically named xindy.mem, and located in the xindy library directory. This option allows to use another xindy kernel. SUPPORTED LANGUAGES
/ CODEPAGES The following languages are supported: Latin scripts albanian gypsy portuguese croatian hausa romanian czech hungarian russian-iso danish icelandic slovak-small english italian slovak-large esperanto kurdish-bedirxan slovenian estonian kurdish-turkish spanish-modern finnish latin spanish-traditional french latvian swedish general lithuanian turkish german-din lower-sorbian upper-sorbian german-duden norwegian vietnamese greek-iso polish German recognizes two different sorting schemes to handle umlauts: normally, "ae" is sorted like "ae", but in phone books or dictionaries, it is sorted like "a". The first scheme is known as DIN order, the second as Duden order. "*-iso" language names assume that the raw index entries are in ISO 8859-9 encoding. "gypsy" is a northern Russian dialect. Cyrillic scripts belarusian mongolian serbian bulgarian russian ukrainian macedonian Other scripts greek klingon Available Codepages This is not yet written. You can look them up in your xindy distribution, in the modules/lang/language/ directory (where language is your language). They are named variant-codepage-lang.xdy, where variant- is most often empty (for german, it's "din5007" and "duden"; for spanish, it's "modern" and "traditional", etc.) < Describe available codepages for each language > < Describe relevance of codepages (as internal representation) for LaTeX inputenc > ENVIRONMENT
"XINDY_SEARCHPATH" A list of directories where the xindy modules are searched in. No subtree searching is done (as in TDS-conformant TeX). If this environment variable is not set, the default is used: ".:"modules_dir":"modules_dir"/base". modules_dir is determined at run time, relative to the xindy command location: Either it's ../modules, that's the case for opt-installations. Or it's ../lib/xindy/modules, that's the case for usr-installations. "XINDY_LIBDIR" Library directory where xindy.mem is located. The modules directory may be a subdirectory, too. COMPATIBILITY TO MAKEINDEX
xindy does not claim to be completely compatible with MakeIndex, that would prevent some of its enhancements. That said, we strive to deliver as much compatibility as possible. The most important incompatibilities are o For raw index entries in LaTeX syntax, "index{aaa|bbb}" is interpreted differently. For MakeIndex "bbb" is markup that is output as a LaTeX tag for this page number. For xindy, this is a location attribute, an abstract identifier that will be later associated with markup that should be output for that attribute. For straight-forward usage, when "bbb" is "textbf" or similar, we supply location attribute definitions that mimic MakeIndex's behaviour. For more complex usage, when "bbb" is not an identifier, no such compatibility definitions exist and may also not been created with current xindy. In particular, this means that by default the LaTeX package "hyperref" will create raw index files that cannot be processed with xindy. This is not a bug, this is the unfortunate result of an intented incompatibility. It is currently not possible to get both hyperref's index links and use xindy. A similar situation is reported to exist for the "memoir" LaTeX class. Programmers who know Common Lisp and Lex and want to work on a remedy should please contact the author. o The MakeIndex compatibility definitions support only the default raw index syntax and markup definition. It is not possible to configure raw index parsing or use a MakeIndex style file to describe output markup. KNOWN ISSUES
Option -q also prevents output of error messages. Error messages should be output on stderr, progress messages on stdout. There should be a way to output the final index to stdout. This would imply -q, of course. LaTeX raw index parsing should be configurable. Codepage "utf8" should be supported for all languages, and should be used as internal codepage for LaTeX inputenc re-encoding. SEE ALSO
texindy(1), tex2xindy(1) AUTHOR
Joachim Schrod LEGALESE
Copyright (c) 2004-2010 by Joachim Schrod. xindy is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. Version 1.16 2010-05-10 xindy(1)