redhat lookup man page on unix.com

LOOKUP(1)						      General Commands Manual							 LOOKUP(1)

								 April 22nd, 1994

NAME
   lookup - interactive file search and display

SYNOPSIS
   lookup [ args ] [ file ...  ]

DESCRIPTION
   Lookup  allows  the	quick  interactive  search of text files.  It supports ASCII, JIS-ROMAN, and Japanese EUC Packed formated text, and has an
   integrated romajicakana converter.

THIS MANUAL
   Lookup is flexible for a variety of applications. This manual will, however, focus on the application of searching Jim Breen's edict (Japanese-
   English  dictionary)  and  kanjidic	(kanji database). Being familiar with the content and format of these files would be helpful. See the INFO
   section near the end of this manual for information on how to obtain these files and their documentation.

OVERVIEW OF MAJOR FEATURES
   The following just mentions some major features to whet your appetite to actually read the whole manual (-:

   Romaji-to-Kana Converter
      Lookup can convert romaji to kana for you, eveniEon the flyiEas you type.

   Fuzzy Searching
      Searches can be a bitiEvagueiEoriEfuzzyiE, so that you'll be able to findiEAiubiEeven if you try to search foriExExxciE(the proper  yomikata
      beingiExEx|xxcx|iE).

   Regular Expressions
      Uses  the  powerful  and expressive regular expression for searching. One can easily specify complex searches that affectiEI want lines that
      look like such-and-such, but not like this-and-that, but that also have this particular characteristic....iE

   Wildcard ``Glob'' Patterns
      Optionally, can use well-known filename wildcard patterns instead of full-fledged regular expressions.

   Filters
      You can have lookup not list certain lines that would otherwise match your search, yet can optionally save them for quick review. For  exam-
      ple, you could have all name-only entries from edict filtered from normal output.

   Automatic Modifications
      Similarly,  you can do a standard search-and-replace on lines just before they print, perhaps to remove information you don't care to see on
      most searches. For example, if you're generally not interested in kanjidic's info on Chinese readings, you can have them removed from  lines
      before printing.

   Smart Word-Preference Mode
      You  can	have  lookup  list  only  entries  with  whole	words  that  match  your  search  (as  opposed to an embedded match, such as find-
      ingiEtheiEinsideiEthemiE), but if no whole-word matches exist, will go ahead and list any entry that matches the search.

   Handy Features
      Other handy features include a dynamically settable and parameterized prompt, automatic highlighting of that part of the line  that  matches
      your  search,  an output pager, readline-like input with horizontal scrolling for long input lines, aiE.lookupiEstartup file, automated pro-
      gramability, and much more. Read on!

REGULAR EXPRESSIONS
   Lookup makes liberal use of regular expressions (or regex for short) in controlling various aspects of the searches. If you	are  not  familiar
   with the important concepts of regexes, read the tutorial appendix of this manual before continuing.

JAPANESE CHARACTER ENCODING METHODS
   Internally,	lookup	works with Japanese packed-format EUC, and all files loaded must be encoded similarly. If you have files encoded in JIS or
   Shift-JIS, you must first convert them to EUC before loading (see the INFO section for programs that can do this).

   Interactive input and output encoding, however, may be be selected via the -jis, -sjis, and -euc invocation flags (default is -euc), or by var-
   ious commands to the program (described later).

   Make  sure to use the encoding appropriate for your system.	If you're using kterm under the X Window System, you can use lookup's -jis flag to
   match kterm's default JIS encoding. Or, you might use kterm'siE-km euciEstartup option (or menu selection) to put kterm into EUC mode. Also,  I
   have found kterm's scrollbar (iE-sb -sl 500iE) to be quite useful.

   With  manyiEEnglishiEfonts in Japan, the character that normally prints as a backslash (halfwidth version of iA) in The States appears as a yen
   symbol (the half-width version of ii). How it will appear on your system is a function of what font you use and what output encoding method you
   choose,  which may be different from the font and method that was used to print this manual (both of which may be different from what's printed
   on your keyboard's appropriate key).  Make sure to keep this in mind while reading.

STARTUP
   Let's assume that your copy of edict is in ~/lib/edict. You can start the program simply with

	   lookup ~/lib/edict

   You'll note that lookup spends some time building an index before the defaultiElookup> iEprompt appears.

   Lookup gains much of its search speed by constructing an index of the file(s) to be searched. Since building the index can  be  time  consuming
   itself,  you can have lookup write the built index to a file that can be quickly loaded the next time you run the program.  Index files will be
   given aiE.jiniE(Jeffrey's Index) ending.

   Let's build the indices for edict and kanjidic now:

	   lookup -write ~/lib/edict ~/lib/kanjidic

   This will create the index files
	  ~/lib/edict.jin
	  ~/lib/kanjidic.jin
   and exit.

   You can now re-start lookup , automatically using the pre-computed index files as:

	  lookup ~/lib/edict ~/lib/kanjidic

   You should then be presented with the prompt without having to wait for the index to be constructed (but see the section  on  Operating  System
   concerns for possible reasons of delay).

INPUT
   There  are  basically  two  types  of  input:  searches  and commands.  Commands do such things as tell lookup to load more files or set flags.
   Searches report lines of a file that match some search specifier (where lines to search for are specified by one or more regular expressions).

   The input syntax may perhaps at first seem odd, but has been designed to be powerful and concise. A bit of time invested to learn it well  will
   pay off greatly when you need it.

BRIEF EXAMPLE
   Assuming you've started lookup with edict and kanjidic as noted above, let's try a few searches. In these examples, the
       iEsearch [edict]> iE
   is the prompt.  Note that the space after theiAE>iCis part of the prompt.

   Given the input:

     search [edict]> tranquil

   lookup will report all lines with the stringiEtranquiliEin them. There are currently about a dozen such lines, two of which look like:

     oAxex<< [xax1xex<<] /peaceful (an)/tranquil/calm/restful/
     oAxex(R) [xax1xex(R)] /peace/tranquility/

   Notice  that  lines	withiEtranquiliEandiEtranquilityiEmatched?  This is becauseiEtranquiliEwas embedded in the wordiEtranquilityiE.  You could
   restrict the search to only the wordiEtranquiliEby prepending  the  specialiEstart  of  wordiEsymboliAE<iCand  appending  the  specialiEend	of
   wordiEsymboliAE>iCto the regex, as in:

     search [edict]> <tranquil>

   This  is the regular expression that saysiEthe beginning of a word, followed by aiAEtiC,iAEriC, ...,iAEliC, which is at the end of a word.iEThe
   current version of edict has just three matching entries.

   Let's try another:

     search [edict]> fukushima

   This is a search for theiEEnglishiEfukushima -- ways to search for kana or kanji will be explored later.  Note that	among  the  several  lines
   selected and printed are:
	      _
     EuAc [xOx x.xb] /Fuk_shima (pn,pl)/
     IUA3/4EiAc [xx1/2xOx x.xb] /Kisofukushima (pl)/

   By  default,  searches  are	done in a case-insensitive manner --iAEFiCandiAEfiCare treated the same by lookup, at least so far as the matching
   goes.  This is called case folding.

   Let's give a command to turn this option off, so thatiAEfiCandiAEFiCwon't be considered the same.  Here's an odd  point  about  lookup's  input
   syntax:  the  default setting is that all command lines must begin with a space.  The space is the (default) command-introduction character and
   tells the input parser to expect a command rather than a search regular expression.	It is a common mistake at  first  to  forget  the  leading
   space when issuing a command.  Be careful.

   Try the commandiE foldiEto report the current status of case-folding.  Notice that as soon as you type the space, the prompt changes to
     iElookup command> iE
   as a reminder that now you're typing a command rather than a search specification.

     lookup command>  fold

   The reply should beiEfile #0's case folding is oniE

   You	can  actually  turn  it  off  withiE  fold  offiE.   Now  try  the  search  foriEfukushimaiEagain.  Notice  that  this	time  the  entries
   withiEFukushimaiEaren't listed? Now try the search stringiEFukushimaiEand see that the entries withiEfukushimaiEaren't listed.

   Case folding is usually very convenient (it also makes corresponding katakana and hiragana match the same), so don't forget to turn it back on:

     lookup command>  fold on

JAPANESE INPUT
   Lookup has an automatic romajicakana converter. A leadingiAE/iCindicates that romaji is to follow. Try typingiE/tokyoiEand you'll see  it  con-
   vert toiE/xExxciEas you type. When you hit return, lookup will list all lines that have aiExExxciEsomewhere in them. Well, sort of.	Look care-
   fully at the lines which match. Among them (if you had case folding back on) you'll see:

     YYeY1YE9|u [YYeY1YExxcx|] /Christianity/
     Aiub [xEx|xxcx|] /Toukyou (pl)/Tokyo/current capital of Japan/
     AEI9|A [xExAxxcx|] /convex lens/

   The first one hasiExExxciEin it (asiEYExxciE, where the katakanaiEYEiEmatches in a case-insensitive manner  to  the	hiraganaiExEiE),  but  you
   might consider the others unexpected, since they don't haveiExExxciEin them.  They're close (iExEx|xxciEandiExExAxxciE), but not exact. This is
   the result of lookup'siEfuzzificationiE. Try the commandiE fuzziE(again, don't forget the command-introduction space).  You'll see that  fuzzi-
   fication is turned on.  Turn it off withiE fuzz offiEand tryiE/tokyoiE(which will convert as you type) again.  This time you only get the lines
   which haveiExExxciEexactly (well, case folding is still on, so it might match katakana as well).

   In a fuzzy search, length of vowels is ignored --iExEiEis considered the same asiExEx|iE,  for  example.  Also,  the  presence  or  absence	of
   anyiExAiEcharacter is ignored, and the pairs x, xA, xo xA, x" xn, and xa xo are considered identical in a fuzzy search.

   It  might  be  convenient  to consider a fuzzy search to be aiEpronunciation searchiE.   Special note: fuzzification will not be performed if a
   regular expressioniE*iE,iE+iE,oriE?iEmodifies a non-ASCII character. This is not an issue when input patterns are filename-like  wildcard  pat-
   terns (discussed below).

   In  addition  to  kana fuzziness, there's one special case for kanji when fuzziness is on. The kanji repeater markiEi1iEwill be recognized such
   thatiE>>bi1iEandiE>>b>>biEwill match each-other.

   Turn fuzzification back on (iEfuzz oniE), and search for all whole words which sound likeiEtokyoiE. That search would be specified as:

     search [edict]> /<tokyo>

   (again, theiEtokyoiEwill be converted toiExExxciEas you type).  My copy of edict has the three lines

     Aiub [xEx|xxcx|] /Toukyou (pl)/Tokyo/current capital of Japan/
     AEAuo [xExAxxc] /special permission/patent/
     AEI9|A [xExAxxcx|] /convex lens/

   This kind of  whole-word  romaji-to-kana  search  is  so  common,  there's  a  special  short  cut.	Instead  of  typingiE/<tokyo>iE,  you  can
   typeiE[tokyo]iE.    The  leadingiAE[iCmeansiEstart  romajiiEandiEstart  of  wordiE.	 Were  you  to	typeiE<tokyo>iEinstead	(without  a  lead-
   ingiAE/iCoriAE[iCto indicate romaji-to-kana conversion), you would get all lines with the English whole-wordiEtokyoiEin them.  That would be  a
   reasonable request as well, but not what we want at the moment.

   Besides  the kana conversion, you can use any cut-and-paste that your windowing system might provide to get Japanese text onto the search line.
   CutiExExxciEfrom somewhere and paste onto the search line. When hitting enter to run the search, you'll notice that it is done without fuzzifi-
   cation (even if the fuzzification flag wasiEoniE).  That's because there's no leadingiAE/iC. Not only does a leadingiAE/iCndicate that you want
   the romaji-to-kana conversion, but that you want it done fuzzily.

   So, if you'd like fuzzy cut-and-paste, just type a leadingiAE/iCefore pasting (or go back and prepend one after pasting).

   These examples have all been pretty simple, but you can use all the power that regexes have to offer. As a slightly more complex  example,  the
   searchiE<gr[ea]y>iEwould  look for all lines with the wordsiEgreyiEoriEgrayiEin them.  Since theiAE[iCisn't the first character of the line, it
   doesn't mean what was mentioned above (start-of-word romaji).  In this case, it's just the regular-expressioniEclassiEindicator.

   If you feel more comfortable using filename-likeiE*.txtiEwildcard patterns, you can use theiEwildcard oniEcommand to have patterns  be  consid-
   ered this way.

   This has been a quick introduction to the basics of lookup.

   It can be very powerful and much more complex. Below is a detailed description of its various parts and features.

READLINE INPUT
   The actual keystrokes are read by a readline-ish package that is pretty standard. In addition to just typing away, the following keystrokes are
   available:

     ^B  / ^F	  move left/right one character on the line
     ^A  / ^E	  move to the start/end of the line
     ^H  / ^G	  delete one character to the left/right of the cursor
     ^U  / ^K	  delete all characters to the left/right of the cursor
     ^P  / ^N	  previous/next lines on the history list
     ^L or ^R	  redraw the line
     ^D 	  delete char under the cursor, or EOF if line is empty
     ^space	  force romaji conversion (^@ on some systems)

   If automatic romaji-to-kana conversion is turned on (as it is by default), there are certain situations where the conversion will be  done,	as
   we saw above. Lower-case romaji will be converted to hiragana, while upper-case romaji to katakana.	This usually won't matter, though, as case
   folding will treat hiragana and katakana the same in the searches.

   In exactly what situations the automatic conversion will be done is intended to be rather intuitive once the basic idea is  learned.   However,
   at  any time, one can use control-space to convert the ASCII to the left of the cursor to kana. This can be particularly useful when needing to
   enter kana on a command line (where auto conversion is never done; see below)

ROMAJI FLAVOR
   Most flavors of romaji are recognized. Special or non-obvious items are mentioned below. Lowercase are  converted  to  hiragana,  uppercase	to
   katakana.

   Long vowels can be entered by repeating the vowel, or withiAE-iCoriAE^iC.

   In  situations  where  aniEniEcould	be  vague,  as	iniEnaiEbeing  xE or xoxc, use a single quote to force xo.  Therefore,iOkenichiixcax+-xExA
   whileiOken'ichiixcax+-xoxxxA.

   The romaji has been richly  extended  with  many  non-standard  combinations  such  as  xOxi  or  xAxS,  which  are	represented  in  intuitive
   ways:iOfaixcaxOxi,iOcheixcaxAxS. etc.

   Various other mappings of interest:

     wo caxo	 wecaxn      wicaxo
     VA caYoYi	 VIcaYoYL    VUcaYo	 VEcaYoYS    VOcaYoY(C)
     di caxA	 dzicaxA     dyacaxAxa	 dyucaxAxa   dyocaxAxc
     du caxA	 tzucaxA     dzucaxA

   (the following kana are all smaller versions of the regular kana)

     xa caxi	 xicaxL      xucaxY	 xecaxS      xocax(C)
     xu caxY	 xtucaxA     xwacaxi	 xkacaYo     xkecaYo
     xyacaxa	 xyucaxa     xyocaxc

INPUT SYNTAX
   Any input line beginning with a space (or whichever character is set as the command-introduction character) is processed as a command to lookup
   rather than a search spec.  Automatic kana conversion is never done on these lines (but forced conversion with control-space may be done at any
   time).

   Other lines are taken as search regular expressions, with the following special cases:

   ?  A  line  consisting  of  a  single  question mark will report the current command-introduction character (the default is a space, but can be
      changed with theiEcmdchariEcommand).

   =  If a line begins withiAE=iC, the line (without theiAE=iC) is taken as a search regular expression, and no  automatic  (or  internal  --  see
      below)  kana conversion is done anywhere on the line (although again, conversion can always be forced with control-space).  This can be used
      to initiate a search where the beginning of the regex is the command-introduction character, or in certain situations where  automatic  kana
      conversion is temporarily not desired.

   /  A  line  beginning  withiAE/iCindicates  romaji input for the whole line.  If automatic kana conversion is turned on, the conversion will be
      done in real-time, as the romaji is typed. Otherwise it will be done internally once the line is entered.  Regardless, the presence  of  the
      leadingiAE/iCindicates that any kana (either converted or cut-and-pasted in) should beiEfuzzifiediEif fuzzification is turned on.

      As  an  addition to the above, if the line doesn't begin withiAE=iCor the command-introduction character (and automatic conversion is turned
      on),iAE/iC anywhere on the line initiates automatic conversion for the following word.

   [  A line beginning withiAE[iCis taken to be romaji (just as a line beginning withiAE/iC, and the converted romaji is subject to  fuzzification
      (if  turned on).	However, ifiAE[iCis used rather thaniAE/iC, an impliediAE<iCiEbeginning of wordiEis prepended to the resulting kana regex.
      Also, any endingiAE]iCon such a line is converted to theiEending of wordiEspecifieriAE>iCin the resulting regex.

   In addition to the above, lines may have certain prefixes and suffixes to control aspects of the search or command:

   !  Various flags can be toggled for the duration of a particular search by prepending aiE!!iEsequence to the input line.

      Sequences are shown below, along with commands related to each:

       !F! iA  Filtration is toggled for this line (filter)
       !M! iA  Modification is toggled for this line (modify)
       !w! iA  Word-preference mode is toggled for this line (word)
       !c! iA  Case folding is toggled for this line (fold)
       !f! iA  Fuzzification is toggled for this line (fuzz)
       !W! iA  Wildcard-pattern mode is toggled for this line (wildcard)
       !r! iA  Raw. Force fuzzification off for this line
       !h! iA  Highlighting is toggled for this line (highlight)
       !t! iA  Tagging is toggled for this line (tag)
       !d! iA  Displaying is on for this line (display)

      The letters can be combined, as iniE!cf!iE.

      The finaliAE!iC can be omitted if the first character after the sequence is not an ASCII letter.

      If no letters are given (iE!!iE).iE!f!iEis the default.

      These last two points can be conveniently combined in the common case ofiE!/romajiiEwhich would be the same asiE!f!/romajiiE.

      The special sequenceiE!?iElists the above, as well as indicates which are currently turned on.

      Note that the letters accepted in aiE!!iEsequence are many of the indicators shown by theiEfilesiEcommand.

   +  AiAE+iCprepended to anything above will cause the final search regex to be printed. This can be useful to see when and what kind of fuzzifi-
      cation and/or internal kana conversion is happening. Consider:

	search [edict]> +/xix<<xe
	a match isiExi[xixci1/4]*xA?x<<[xixci1/4]*xe[xYx|xax(C)i1/4]*iE

      Due to theiEleadingiE/ the kana is fuzzified, which explains the somewhat complex resulting regex. For comparison, note:

	search [edict]> +xix<<xe
	a match isiExix<<xeiE
	search [edict]> +!/xix<<xe
	a match isiExix<<xeiE

      As  theiAE+iCshows,  these  are  not  fuzzified.	The  first  one  has no leadingiAE/iCoriAE[iCto induce fuzzification, while the second has
      theiAE!iCline prefix (which is the default version ofiE!f!iE), which toggles fuzzification mode toiEoffiEfor that line.

   ,  The default of all searches and most commands is to work with the first file loaded (edict in these examples). One can change  this  default
      (see  theiEselectiEcommand) or, by appending a comma+digit sequence at the end of an input line, force that line to work with another previ-
      ously-loaded file. An appendediE,1iEworks with first extra file loaded (in these examples, kanjidic).  An appendediE,2iEworks with  the  2nd
      extra file loaded, etc.

      An appendediE,0iEworks with the original first file (and can be useful if the default file has been changed via theiEselectiEcommand).

      The following sequence shows a common usage:

	search [edict]> [xExxcxE]
	AiubAO [xEx|xxcx|xE] /Tokyo Metropolitan area/

      cutting and pasting the AO from above, and adding aiE,1iEto search kanjidic:

	search [edict]> AO,1
	AO 4554 N4769 S11  ..... YE YA xBxax3 {metropolis} {capital}

FILENAME-LIKE WILDCARD MATCHING
   When wildcard-pattern mode is selected, patterns are considered as extended.Q "*.txt" "-like" patterns. This is often more convenient for users
   not familiar with regular expressions. To have this mode selected by default, put

      default wildcard on

   into youriE.lookupiEfile (seeiESTARTUP FILEiEbelow).

   When wildcard mode is on, only iE*iE,iE?iE,iE+iE,andiE.iE,are effected.  See the entry for the iEwildcardiEcommand below for details.

   Other features, such as the multiple-pattern searches (described below) and other regular-expression metacharacters are available.

MULTIPLE-PATTERN SEARCHES
   You can put multiple patterns in a single search specifier.	For example consider

     search [edict]> china||japan

   The first part (iEchinaiE) will select all lines that haveiEchinaiEin them. Then, from among those lines, the second  part  will  select  lines
   that haveiEjapaniEin them.  TheiE||iEis not part of any pattern -- it is lookup'siEpipeiEmechanism.

   The	above  example	is  very different from the single pattern iEchina|japaniEwhich would select any line that had eitheriEchinaiEoriEjapaniE.
   WithiEchina||japaniE, you get lines that haveiEchinaiEand then also haveiEjapaniEas well.

   Note that it is also different from the regular expressioniEchina.*japaniE(or the wildcard patterniEchina*japaniE)which would select lines hav-
   ingiEchina,	then maybe some stuff, then japaniE.  But consider the case wheniEjapaniEcomes on the line beforeiEchinaiE. Just for your compari-
   son, the multiple-pattern specifieriEchina||japaniEis pretty much the same as the single regular expressioniEchina.*japan|japan.*chinaiE.

   If you useiE|!|iEinstead ofiE||iE, it will meaniE...and then lines not matching...iE.

   Consider a way to find all lines of kanjidic that do have a Halpern number, but don't have a Nelson number:

       search [edict]> <Hd+>|!|<Nd+>

   If you then wanted to restrict the listing to those that also had aiEjinmeiyouiEmarking (kanjidic'siEG9iEfield) and had a reading of  xcx,  you
   could make it:

       search [edict]> <Hd+>|!|<Nd+>||<G9>||<xcx>

   A prependediAE+iCwould explain:

       a match isiE<Hd+>iE
       and notiE<Nd+>iE
       andiE<G9>iE
       andiE<xcx>iE

   TheiE|!|iEandiE||iEcan be used to make up to ten separate regular expressions in any one search specification.

   Again,  it  is  important to stress thatiE||iEdoes not meaniEoriE(as it does in a C program, or asiAE|iCdoes within a regular expression).  You
   might find it convenient to readiE||iEasiEand alsoiE, while readingiE|!|iEasiEbut notiE.

   It is also important to stress that any whitespace around theiE||iEandiE|!|iEconstruct is not ignored, but kept as part of the regex on  either
   side.

COMBINATION SLOTS
   Each file, when loaded, is assigned to aiEslotiEvia which subsequent references to the file are then made.  The slot may then be searched, have
   filters and flags set, etc.

   A special kind of slot, called aiEcombination slotiE,rather than representing a single file, can represent  multiple  previously-loaded  slots.
   Searches  against a combination slot (oriEcombo slotiEfor short) search all those previously-loaded slots associated with it (callediEcomponent
   slotsiE).  Combo slots are set up with the combine command.

   A Combo slot has no filter or modify spec, but can have a local prompt and flags just like normal file slots.  The flags, however, have special
   meanings  with  combo  slots. Most combo-slot flags act as a mask against the component-slot flags; when acted upon as a member of the combo, a
   component-slot's flag will be disabled if the corresponding combo-slot's flag is disabled.

   Exceptions to this are the autokana, fuzz, and tag flags.

   The autokana and fuzz flags governs a combo slot exactly the same as a regular file slot.  When a slot is searched as a component of a combina-
   tion slot, the component slot's fuzz (and autokana) flags, or lack thereof, are ignored.

   The tag flag is quite different altogether; see the tag command for complete information.

   Consider the following output from the files command:

     "(R)"~"3"~"~"~"~","~"~"3"~"~"~"3"~"~"~"~"~"~"~"~"~"~"~"~"~"~
     " 0"F wcfh d"ca I " 2762k"/usr/jfriedl/lib/edict
     " 1"FM cf	d"ca I "  705k"/usr/jfriedl/lib/kanjidic
     " 2"F  cfh@d"ca   "    1k"/usr/jfriedl/lib/local.words
     "*3"FM cfhtd"ca   " combo"kotoba (#2, #0)
     "+-"~"u"~"~"~"~"o"~"~"u"~"~"~"u"~"~"~"~"~"~"~"~"~"~"~"~"~"~

   See the discussion of the files command below for basic explanation of the output.

   As  can  be	seen,  slot  #3 is a combination slot with the nameiEkotobaiEwith component slots two and zero. When a search is initiated on this
   slot, first slot #2iElocal.wordsiEwill be searched, then slot #0iEedictiE.	Because the combo slot's filter flag is on, the  component  slots'
   filter flag will remain on during the search.  The combo slot's word flag is off, however, so slot #0's word flag will be forced off during the
   search.

   See the combine command for information about creating combo slots.

PAGER
   Lookup has a built in pager (a'la more).  Upon filling a screen with text, the string
       --MORE [space,return,c,q]--
   is shown. A space will allow another screen of text; a return will allow one more line. AiAEciC will allow  output  text  to  continue  unpaged
   until the next command. AiAEqiC will flush output of the current command.

   If  supported  by  the OS, lookup's idea of the screen size is automatically set upon startup and window resize.  Lookup must know the width of
   the screen in doing both the horizontal input-line scrolling, and for knowing when a long line wraps on the screen.

   The pager parameters can be set manually with theiEpageriEcommand.

COMMANDS
   Any line intended to be a command must begin with the command-introduction character (the default is a space, but  can  be  set  via  theiEcmd-
   chariEcommand).  However, that character is not part of the command itself and won't be shown in the following list of commands.

   There  are  a number of commands that work with the selected file or selected slot (both meaning the same thing).  The selected file is the one
   indicated by an appended comma+digit, as mentioned above. If no such indication is given, the default selected file is used (usually the  first
   file loaded, but can be changed with theiEselectiEcommand).

   Some  commands  accept  a boolean argument, such as to turn a flag on or off. In all such cases, aiE1iEoriEoniEmeans to turn the flag on, while
   aiE0iEoriEoffiEis used to turn it off.  Some flags are per-file (iEfuzziE,iEfoldiE, etc.), and a command to set such a flag normally  sets  the
   flag  for  the  selected file only. However, the default value inherited by subsequently loaded files can be set by prependingiEdefaultiEto the
   command. This is particularly useful in the startup file before any files are loaded (see the section STARTUP FILE).

   Items separated byiAE|iCare mutually exclusive possibilities (i.e. a boolean argument isiE1|on|0|offiE).

   Items shown in brackets (iAE[iCandiAE]iC) are optional. All commands that accept a boolean argument to set a flag or mode do so  optionally	--
   with no argument the command will report the current status of the mode or flag.

   Any command that allows an argument in quotes (such as load, etc.)  allow the use of single or double quotes.

   The commands:

   [default] autokana [boolean]
      Automatic  romaji  ca  kana  conversion  for the selected file is turned on or off (default is on).  However, ifiEdefaultiEis specified, the
      value to be inherited as the default by subsequently-loaded files is set (or reported).

      Can be temporarily disabled by a prependediAE=iC,as described in the INPUT SYNTAX section.

   clear|cls
      Attempts to clear the screen. If you're using a kterm it'll just output the appropriate tty control sequence. Otherwise  it'll  try  to  run
      theiEcleariEcommand.

   cmdchar ['one-byte-char']
      The  default  command-introduction character is a space, but it may be changed via this command. The single quotes surrounding the character
      are required. If no argument is given, the current value is printed.

      An input line consisting of a single question mark will also print the current value (useful for when you don't know the current value).

      Woe to the one that sets the command-introduction character to one of the other special input-line characters, such asiAE+iC,iAE/iC, etc.

   combine ["name"] [ num += ] slotnum ...
      Creates or adds file slots to a combination slot (see the COMBINATION SLOTS section for general information).  Note thatiEcomboiEmay be used
      as the command as well.

      Assuming	for this example that slots 0-2 are loaded with the files curly, moe, and larry, we can create a combination slot that will refer-
      ence all three:

	combo "three stooges" 2, 0, 1

      The command will report

	creating combo slot #3 (three stooges): 2 0 1

      The name is optional, and will appear in the files list, and also maybe be used to specify the slot as an argument to the select command.

      A search via the newly created combo slot would search in the order specified on the combo  command  line:  first  larry,  then  curly,  and
      finally moe.

      If you later load another file (say, jeffrey to slot #4), you can then add it to the previously made combo:

	combo 3 += 4

      (theiE+=iEwording  comes from the C programming language where it meansiEadd on toiE).  Adding to a combination always adds slots to the end
      of the list.

      You can take the opportunity of adding the slot to also change the name, if you like:

	combo "four stooges" 3 += 4

      The reply would be
	adding to combo slot #3(four stooges): 4

      A file slot can be a component of any particular combo slot only once.  When reporting the created or added slot numbers,  the  number  will
      appear in parenthesis if it had already been a member of the list.

      Furthermore, only file slots can be component members of combo slots. Attempting to combine combo slot X to combo slot Y will result in hav-
      ing X's component file slots (rater than the combo slot itself) added to Y.

   command debug [boolean]
      Sets the internal command parser debugging flag on or off (default is off).

   debug [boolean]
      Sets the internal general-debugging flag on or off (default is off).

   describe specifier
      This command will tell you how a character (or each character in a string) is encoded in the various encoding methods:

	  lookup command>  describe "ux"
	  iEuxiEas  EUC  is 0xb5a4 (181 164; 265 244)
		as  JIS  is 0x3524 ( 53  36;  65 44 "5$")
		as KUTEN is   2104 ( 0x1504;  25 04)
		as S-JIS is 0x8b1f (139  31; 213 37)

      The quotes surrounding the character or string to describe are optional.	You can also give a regular ASCII character and have  the  double-
      width  version of the character described.... indicatingiEAiE, for example, would describeiELAiE.   Specifier can also be a four-digit kuten
      value, in which case the character with that kuten will be described.

      If a four-digit specifier has a hex digit in it, or if it is preceded byiE0xiE, the value is taken as a JIS code. You can precede the  value
      byiEjisiE,iEsjisiE,iEeuciE, oriEkuteniEto force interpretation to the requested code.

      Finally, specifier can be a string of stripped JIS (JIS w/o the kanji-in and kanji-out codes, or with the codes but without the escape char-
      acters in them).	For exampleiEF|KiEwould describe the two characters AEu and EU.

   encoding [euc|sjis|jis]
      The same as the -euc, -jis, and -sjis command-line options, sets the encoding method for interactive input and output (or reports  the  cur-
      rent  status).   More detail over the output encoding can be achieved with the output encoding command. A separate encoding for input can be
      set with the input encoding command.

   files [ - | long ]
      Lists what files are loaded in what slots, and some status information about them, as with:

      "*0"F wcfh d"ca I " 3749k"/usr/jeff/lib/edict
      " 1"FM cf  d"ca I "  754k"/usr/jeff/lib/kanjidic

	"(R)"~"3"~"~"~"~"~","~"~"3"~"~"~"3"~"~"~"~"~"~"~"~"~"~"~"~"~"~
	" 0"F wcf h d "ca I " 2762k"/usr/jfriedl/lib/edict
	" 1"FM cf   d "ca I "  705k"/usr/jfriedl/lib/kanjidic
	" 2"F  cfWh@d "ca   "	 1k"/usr/jfriedl/lib/local.words
	"*3"FM cf htd "ca   " combo"kotoba (#2, #0)
	" 4"   cf   d "ca   "  205k"/usr/dict/words
	"+-"~"u"~"~"~"~"~"o"~"~"u"~"~"~"u"~"~"~"~"~"~"~"~"~"~"~"~"~"~

      The first section is the slot number, with aiE*iEbeside the default slot (as set by the select command).

      The second section shows per-slot flags and status. Letters are shown if the flag is on, omitted if off. In the list below, related commands
      are given for each item:

	F iA if there is a filter {but '#' if disabled}. (filter)
	M iA if there is a modify spec {but '%' if disabled}. (modify)
	w iA if word-preference mode is turned on. (word)
	c iA if case folding is turned on. (fold)
	f iA if fuzzification is turned on. (fuzz)
	W iA if wildcard-pattern mode is turned on (wildcard)
	h iA if highlighting is turned on. (highlight)
	t iA if there is a tag {but @ if disabled} (tag)
	d iA if found lines should be displayed (display)
	"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i"i
	a iA if autokana is turned on (autokana)
	P iA if there is a file-specific local prompt (prompt)
	I iA if the file is loaded with a precomputed index (load)
	d iA if the display flag is on (display)
      Note that the letters in the upper section directly correspond to theiE!!iEsequence characters described in the INPUT SYNTAX section.

      If  there  is  a digit at the end of the flag section, it indicates that only #/10 of the file is actually loaded into memory (as opposed to
      the file having been completely loaded). Unloaded files will be loaded while lookup is idle, or when first used.

      If the slot is a combination slot (as slot #3 is in the example above), that is noted in the third section, and  the  combination  name  and
      component slot numbers are noted in the fourth. Also, for combination slots (which have no filter or modify specifications, only the flags),
      F and/or M are shown if the corresponding mode is allowed during searches via the combo slot. See the tag command  for  info  about  t  with
      respect to combination slots.

      If an argument (eitheriE-iEoriElongiEwill work) is given to the command, a short message about what the flags mean is also printed.

   filter ["label"] [!] /regex/[i]
      Sets the filter for the selected slot (which must contain a file and not a combination).	If a filter is set and active for a file, any line
      matching the given regex is filtered from the output (if theiAE!iCis put before the regex, any line not matching	the  regex  is	filtered).
      The label , which isn't required, merely acts as documentation in various diagnostics.

      As  an  example,	consider  that edict lines often haveiE(pn)iEon them to indicate that the given English is a place name. Often these place
      names can be a bother, so it would be nice to elide them from the output unless specifically requested.  Consider the example:

	lookup command>  filter "name" /(pn)/
	search [edict]> [xxI]
	uiC1/2 [xxIx|] /function/faculty/
	ucC1/4 [xxIx|] /inductive/
	ooAEu [xxIx|] /yesterday/
	ca3 "name" lines filteredca

      In the example,iAE/iCcharacters are used to delimit the start and stop of the regex (as is common with many programs). However, any  charac-
      ter can be used. A finaliAEiiC, if present, indicates that the regex should be applied in a case-insensitive manner.

      The  filter,  once set, can be enabled or disabled with the other form of theiEfilteriEcommand (described below). It can also be temporarily
      turned off (or, if disabled, temporarily turned on) by theiE!F!iEline prefix.

      Filtered lines can optionally be saved and then displayed if you so desire.  See theiEsaved list sizeiEandiEshowiEcommands.

      Note that if you have saving enabled and only one line would be filtered, it is simply printed at the end (rather than print a one line mes-
      sage about how one line was filtered).

      By the way, a betteriEnameiEfilter for edict would be:

	filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#

      as it would filter all entries that had only one English section, that section being a name.  It is also an example of using something other
      thaniAE/iCto delimit a regex, as it makes things a bit easier to read.

   filter [boolean]
      Enables or disables the filter for the selected slot.  If no argument is given, displays the current filter and status.

   [default] fold [boolean]
      The selected slot's case folding is turned on or off (default is on), or reported if no argument given.  However, ifiEdefaultiEis specified,
      the value to be inherited as the default by subsequently-loaded files is set (or reported).

      Can be temporarily toggled by theiE!c!iEline prefix.

   [default] fuzz [boolean]
      The  selected  slot's  fuzzification is turned on or off (default is on), or reported if no argument given.  However, ifiEdefaultiEis speci-
      fied, the value to be inherited as the default by subsequently-loaded files is set (or reported).

      Can be temporarily toggled by theiE!f!iEline prefix.

   help [regex]
      Without an argument gives a short help list. With an argument, lists only commands whose help string is picked up by the given regex.

   [default] highlight [boolean]
      Sets matched-string highlighting on or off for the selected slot (default off), or reports the current status if no argument is given.  How-
      ever, ifiEdefaultiEis specified, the value to be inherited as the default by subsequently-loaded files is set (or reported).

      If  on,  shows  in  bold or reverse video (see below) that part of the line which was matched by the search regex.  If multiple regexes were
      given, that part matched by the first regex is show.

      Note that a regex might match a portion of a line which is later removed by a modify parameter. In this case, no highlighting is done.

      Can be temporarily toggled by theiE!h!iEline prefix.

   highlight style [bold | inverse | standout | <___>]
      Sets the style of highlighting for when highlighting is done.  Inverse (inverse video) and standout are the same. The default is bold.   You
      can also give an HTML tag, such asiE<BOLD>iEand items will be wrapped by <BOLD>...</BOLD>. This would be particularly useful when the output
      is going to a CGI, as when lookup has been built in a server configuration.

      Note that the highlighting is affected by using raw VT100/xterm control sequences. This  isn't  particularly  very  nice	if  your  terminal
      doesn't understand them. Sorry.

   if {expression} command...

      If the evaluated expression is non-zero, the command will be executed.

      Note that {} rather than () surround the expression.

      Expression may be comprised of numbers, operators, parenthesis, etc.  In addition to the normal +, -, *, and /, are:

	 !x  iA yields 0 if x is non-zero, 1 if x is zero.
	 x && y iA
	 !x    iAiAEnotiCYields 1 if x is zero, 0 if non-zero.
	 x & y iAiAEandiCYields 1 if both x and y are non-zero, 0 otherwise.
	 x | y iAiAEoriC Yields 1 if x or y (or both) is non-zero, 0 otherwise

      There may also be the special tokens true and false which are 1 and 0 respectively.

      There are also checked, matched, printed, nonword, and filtered which correspond to the values printed by the stats command.

      An example use might be the following kind of thing in an computer-generated script:

	!d!expect this line
	if {!printed} msg Oops! couldn't find "expect this line"

   input encoding [ euc | sjis ]
      Used to set (or report) what encoding to use when 8-bit bytes are found in the interactive input (all flavors of JIS are always recognized).
      Also see the encoding and output encoding commands.

   limit [value]
      Sets the number of lines to print during any search before aborting (or reports the current number if no value given). Default is 100.

      Output limiting is disabled if set to zero.

   log [ to [+] file ]
      Begins logging the program output to file (the Japanese encoding method being the same as for screen output).  IfiE+iEis given, the  log	is
      appended to any text that might have previously been in file, in which case a leading dashed line is inserted into the file.

      If no arguments are given, reports the current logging status.

   log	- | off
      If onlyiE-iEor off is given, any currently-opened log file is closed.

   load [-now|-whenneeded] "filename"
      Loads  the  named file to the next available slot.  If a precomputed index is found (asiEfilename.jiniE)it is loaded as well.  Otherwise, an
      index is generated internally.

      The file to be loaded (and the index, if loaded) will be loaded during idle times. This allows a startup file  to  list  many  files  to	be
      loaded, but not have to wait for each of them to load in turn. Using the iE-nowiEflag causes the load to happen immediately, while using the
      iE-whenneedediEoption (can be shortened to iE-wniE)causes the load to happen only when the slot is first accessed.

      Invoke lookup as
	 % lookup -writeindex filename
      to generate and write an index file, which will then be automatically used in the future.

      If the file has already been loaded, the file is not re-read, but the previously-read file is shared. The new slot will, however,  have  its
      own separate flags, prompt, filter, etc.

   modify /regex/replace/[ig]
      Sets  the  modify  parameter for the selected file.  If a file has a modify parameter associated with it, each line selected during a search
      will have that part of the line which matches regex (if any) replaced by the replacement string before being printed.

      Like the filter command, the delimiter need not beiAE/iC; any non-space character is fine.  If a finaliAEiiCis given, the regex  is  applied
      in  a  case-insensitive  manner.	If a finaliAEgiCis given, the replacement is done to all matches in the line, not just the first part that
      might match regex.

      The replacement may have embeddediE1iE, etc. in it to refer to parts of the matched text (see the tutorial on regular expressions).

      The modify parameter, once set, may be enabled or disabled with the other form of the modify command (described below).  It may also be tem-
      porarily toggled via theiE!m!iEline prefix.

      A silly example for the ultra-nationalist might be:
	modify /<Japan>/Dainippon Teikoku/g
      So that a line such as
	AEu9|a [xExAx(R)xo] /Bank of Japan/
      would come out as
	AEu9|a [xExAx(R)xo] /Bank of Dainippon Teikoku/

      As  a real example of the modify command with kanjidic, consider that it is likely that one is not interested in all the various fields each
      entry has.  The following can be used to remove the info on the U, N, Q, M, E, B, C, and Y fields from the output:

	modify /( [UNQMECBY]S+)+//g,1

      It's sort of complex, but works.	Note that here the replacement part is empty, meaning to just  remove  those  parts  which  matched.   The
      result of such a search of AEu would normally print

	  AEu 467c U65e5 N2097 B72 B73 S4 G1 H3027 F1 Q6010.0 MP5.0714 iA
	  MN13733 E62 Yri4 P3-3-1 YEYA Y,YA xO -xO -x<< {day}

      but with the above modify spec, appears more simply as

	  AEu 467c S4 G1 H3027 F1 P3-3-1 YEYA Y,YA xO -xO -x<< {day}

   modify [boolean]
      Enables or disables the modify parameter for the selected file, or report the current status if no argument is given.

   msg string
      The given string is printed.

      Most likely used in a script as the target command of an if command.

   output encoding [ euc | sjis | jis...]
      Used  to	set  exactly what kind of encoding should be used for program output (also see the input encoding command). Used when the encoding
      command is not detailed enough for one's needs.

      If no argument is given, reports the current output encoding.  Otherwise, arguments can usually be any reasonable dash-separated combination
      of:

	euc
	   Selects EUC for the output encoding.

	sjis
	   Selects Shift-JIS for the output encoding.

	jis[78|83|90][-ascii|-roman]
	   Selects  JIS  for  the  output encoding.  If no year (78, 83, or 90) given, 78 is used. Can optionally specify thatiEEnglishiEshould be
	   encoded as regular ASCII (the default when JIS selected) or as JIS-ROMAN.

	212
	   Indicates that JIS X0212-1990 should be supported (ignored for Shift-JIS output).

	no212
	   Indicates that JIS X0212-1990 should be not be supported (default setting).	This places JIS X0212-1990 characters under the domain	of
	   disp, nodisp, code, or mark (described below).

	hwk
	   Indicates that half width kana should be left as-is (default setting).

	nohwk
	   Indicates that half width kana should be stripped from the output.  (not yet implemented).

	foldhwk
	   Indicates that half width kana should be folded to their full-width counterparts.  (not yet implemented).

	disp
	   Indicates that non-displayable characters (such as JIS X0212-1990 while the output encoding method is Shift-JIS) should be passed along
	   anyway (most likely resulting in screen garbage).

	nodisp
	   Indicates that non-displayable characters should be quietly stripped from the output.

	code
	   Indicates that non-displayable characters should be printed as their octal codes (default setting).

	mark
	   Indicates that non-displayable characters should be printed asiEiuiE.

	Of course, not all options make sense in all combinations, or at all times.  When the current (or new) output encoding is reported, a com-
	plete and exact specifier representing the output encoding selected.  An example might beiEjis78-ascii-no212-hwk-codeiE.

   pager [ boolean | size ]
      Turns on or off an output pager, sets it's idea of the screen size, or reports the current status.

      Size  can  be  a single number indicating the number of lines to be printed betweeniEMORE?iEprompts (usually a few lines less than the total
      screen height, the default being 20 lines). It can also be two numbers in the formiE#x#iEwhere the first number is the width (in	half-width
      characters; default 80) and the second is the lines-per-page as above.

      If the pager is on, every page of output will result in aiEMORE?iEprompt, at which there are four possible responses. A space will allow one
      more full page to print. A return will allow one more line.  AiAEciC(foriEcontinueiE) will all the rest of the output (for the current  com-
      mand) to proceed without pause, while aiAEqiC(foriEquitiE) will flush the output for the current command.

      If supported by the OS, the pager size parameters are set appropriately from the window size upon startup or window resize.

      The default pager status isiEoffiE.

   [local] prompt "string"
      Sets  the  prompt  string.   IfiElocaliEis  indicated, sets the prompt string for the selected slot only. Otherwise, sets the global default
      prompt string.

      Prompt strings may have the special %-sequences shown below, with related commands given in parenthesis:

	 %N iA the default slot's file or combo name.
	 %n iA like %N, but any leading path is not shown if a filename.
	 %# iA the default slot's number.
	 %S iA theiEcommand-introductioniEcharacter (cmdchar)
	 %0 iA the running program's name
	 %F='string' iA string shown if filtering enabled (filter)
	 %M='string' iA string shown if modification enabled (modify)
	 %w='string' iA string shown if word mode on (word)
	 %c='string' iA string shown if case folding on (fold)
	 %f='string' iA string shown if fuzzification on (fuzz).
	 %W='string' iA string shown if wildcard-pat. mode on (wildcard).
	 %d='string' iA string shown if displaying on (display).
	 %C='string' iA string shown if currently entering a command.
	 %l='string' iA string shown if logging is on (log).
	 %L iA the name of the current output log, if any (log)

      For the tests (%f, etc), you can putiAE!iCjust after theiAE%iCto reverse the sense of the test (i.e. %!f="no fuzz").  The reverse of  %F	is
      if  a  filter  is installed but disabled (i.e.  string will never be shown if there is no filter for the default file).  The modify %M works
      comparably.

      Also, you can use an alternative form for the items that take an argument string. Replacing the quotes with parentheses will treat string as
      a recursive prompt specifier. For example, the specifier

	   %C='command'%!C(%f='fuzzy 'search:)

      would  result  in  aiEcommandiEprompt  if entering a command, while it would result in either aiEfuzzy search:iEor aiEsearch:iEprompt if not
      entering a command.  The parenthesized constructs may be nested.

      Note that the letters of the test constructs are the same as the letters for theiE!!iEsequences described in INPUT SYNTAX.

      An example of a nice prompt command might be:

	      prompt "%C(%0 command)%!C(%w'*'%!f'raw '%n)> "

      With this prompt specification, the prompt would normally appear asiEfilename> iEbut when fuzzification is turned off asiEraw  filename> iE.
      And  if  word-preference	mode  is  on,  the  whole thing has aiE*iEprepended.  However if a command is being entered, the prompt would then
      becomeiEname commandiE, where name was the program's name (system dependent, but most likelyiElookupiE).

      The default prompt format string isiE%C(%0 command)%!C(search [%n])> iE.

   regex debug [boolean]
      Sets the internal regex debugging flag (turn on if you want billions of lines of stuff spewed to your screen).

   saved list size [value]
      During a search, lines that match might be elided from the output due to filters or word-preference mode.  This command sets the	number	of
      such lines to remember during any one search, such that they may be later displayed (before the next search) by the show command.

      The default is 100.

   select [ num | name | . ]
      If num is given, sets the default slot to that slot number.  If name is given, sets the default slot to the first slot found with a file (or
      combination) loaded with that name.  The incantationiEselect .iEmerely sets the default slot to itself, which can be useful in script  files
      where  you  want	to  indicate  that  any subsequent flags changes should work with whatever file was the default at the time the script was
      sourced.

      If no argument is given, simply reports the current default slot (also see the files command).

      In command files loaded via the source command, or as the startup file, commands dealing with per-slot items (flags, local prompt,  filters,
      etc.)  work with the file or slot last selected.	The last such selected slot remains selected once the load is complete.

      Interactively,  the  default  slot  will	become	the  selected  slot  for  subsequent  searches	and commands that aren't augmented with an
      appendediE,#iE(as described in the INPUT SYNTAX section).

   show
      Shows any lines elided from the previous search (either due to a filter or word-preference mode).

      Will apply any modifications (see theiEmodifyiEcommand) if modifications are enabled for the file. You can use theiE!m!iEline prefix as well
      with this command (in this case, put theiE!m!iEbefore the command-indicator character).

      The length of the list is controlled by theiEsaved list sizeiEcommand.

   source "filename"
      Commands are read from filename and executed.

      In  the file, all lines beginning withiE#iEare ignored as comments (note that comments must appear on a line by themselves, asiE#iEis a rea-
      sonable character to have within commands).

      Lines whose first non-blank characters isiE=iE,iE!iE,oriE+iEare considered searches, while all other non-blank lines are	considered  lookup
      commands.  Therefore, there is no need for lines to begin with the command-introduction character. However, leading whitespace is always OK.

      For  search  lines, take care that any trailing whitespace is deleted if undesired, as trailing whitespace (like all non-leading whitespace)
      is kept as part of the regular expression.

      Within a command file, commands that modify per-file flags and such always work with the most-recently loaded (or selected) file. Therefore,
      something along the lines of

	load "my.word.list"
	set word on

	load "my.kanji.list"
	set word off
	set local prompt "enter kanji> "

      would word as might make intuitive sense.

      Since  a	script file must have a load, or select before any per-slot flag is set, one can useiEselect .iEto facilitate command scripts that
      are to work withiEthe current slotiE.

   spinner [value]
      Set the value of the spinner (A silly little feature).  If set to a non-zero value, will cause a spinner to  spin  while	a  file  is  being
      checked, one increment per value lines in the file actually checked against the search specifier.  Default is off (i.e. zero).

   stats
      Shows  information about how many lines of the text file were checked against the last search specifier, and how many lines matched and were
      printed.

   tag [boolean] ["string"]
      Enable, disable, or set the tag for the selected slot.

      If the slot is not a combination slot, a tag string may be set (the quotes are required).

      If a tag string is set and enabled for a file, the string is prepended to each matching output line printed.

      Unlike the filter and modify commands which automatically enable the function when a parameter is set, a tag is  not  automatically  enabled
      when set.  It can be enabled while being set viaiE'tagiEonor could be enabled subsequently via justiEtag oniE If the selected slot is a com-
      bination slot, only the enable/disable status may be changed (on by default). No tag string may be set.

      The reason for the special treatment lies in the special nature of how tags work in conjunction with combination files.

      During a search when the selected slot is a combination slot, each file which is a member of the combination has its per-file flags disabled
      if  their  corresponding	flag  is disabled in the original combination slot. This allows the combination slot's flags to act as aiEmaskiEto
      blot out each component file's per-file flags.

      The tag flag, however, is special in that the component file's tag flag is turned on if the combination slot's tag flag is turned  on  (and,
      of course, the component file has a tag string registered).

      The intended use of this is that one might set a (disabled) tag to a file, yet direct searches against that file will have no prepended tag.
      However, if the file is searched as part of a combination slot (and the combination slot's tag flag is  on),  the  tag  will  be	prepended,
      allowing one to easily understand from which file an output line comes.

   verbose [boolean]
      Sets  verbose mode on or off, or reports the current status (default on).  Many commands reply with a confirmation if verbose mode is turned
      on.

   version
      Reports the current version of the program.

   [default] wildcard [boolean]
      The selected slot's patterns are considerd wildcard patterns if turned on, regular expressions if turned off. The current status is reported
      if  no  argument given.  However, ifiEdefaultiEis specified, the pattern-type to be inherited as the default by subsequently-loaded files is
      set (or reported).

      Can be temporarily toggled by theiE!W!iEline prefix.

      When  wildcard   patterns   are	selected,   the   changed   metacharacters   are:iE*iEmeansiEany   stuffiE,iE?iEmeansiEany   one   charac-
      teriE,whileiE+iEandiE.iEbecome unspecial. Other regex items such asiE|iE,iE(iE,iE[iE,etc. are unchanged.

      WhatiE*iEandiE?iEwill  actually match depends upon the status of word-mode, as well as on the pattern itself.  If word-mode is on, or if the
      pattern begins with the start-of-wordiE<iEoriE[iE,only non-spaces will be matched. Otherwise, any character will be matched.

      In summary,when wildcard mode is on, the input pattern is effected in the following ways:

	 * is changed to the regular expression .* or
	 ? is changed to the regular expression . or	+ is changed to the regular expression +
	 . is changed to the regular expression .

      Because filename patterns are often callediEfilename globsiE,the commandiEglobiEcan be used in place ofiEwildcardiE.

   [default] word|wordpreference [boolean]
      The selected file's word-preference mode is turned on or off (default is off), or reports the current setting if no argument  is	specified.
      However, ifiEdefaultiEis specified, the value to be inherited as the default by subsequently-loaded files is set (or reported).

      In  word-preference  mode,  entries  are searched for as if the search regex had a leadingiAE<iCand a trailingiAE>iC, resulting in a list of
      entries with a whole-word match of the regex.  However, if there are none, but there are non-word entries, the non-word  entries	are  shown
      (theiEsaved  listiEis  used  for	this -- see that command). This make it aniEif there are whole words like this, show me, otherwise show me
      whatever you've gotiEmode.

      If there are both word and non-word entries, the non-word entries are remembered in the  saved  list  (rather  than  any	possible  filtered
      entries being remembered there).

      One  caveat: if a search matches a line in more than one place, and the first is not a whole-word, while one of the others is, the line will
      be listed considered non-whole word.  For example, the searchiOjapanixwith word-preference mode on will not list an  entry  such	asiE/Japa-
      nese/language in Japan/iE, as the firstiEJapaniEis part ofiEJapaneseiEand not a whole word.  If you really need just whole-word entries, use
      theiAE<iCandiAE>iCyourself.

      The mode may be temporarily toggled via theiE!w!iEline prefix.

      The rules defining what lines are filtered, remembered, discarded, and shown for each permutation of search are rather complex, but the  end
      result is rather intuitive.

   quit | leave | bye  | exit
      Exits the program.

STARTUP FILE
   If the fileiE~/.lookupiEis present, commands are read from it during lookup startup.

   The file is read in the same way as the source command reads files (see that entry for more information on file format, etc.)

   However,  if  there had been files loaded via command-line arguments, commands within the startup file to load files (and their associated com-
   mands such as to set per-file flags) are ignored.

   Similarly, any use of the command-line flags -euc, -jis, or -sjis will disable in the startup file the commands dealing with setting the  input
   and/or output encodings.

   The special treatment mentioned in the above two paragraphs only applies to commands within the startup file itself, and does not apply to com-
   mands in command-files that might be sourced from within the startup file.

   The following is a reasonable example of a startup file:
     ## turn verbose mode off during startup file processing
     verbose off

     prompt "%C([%#]%0)%!C(%w'*'%!f'raw '%n)> "
     spinner 200
     pager on

     ## The filter for edict will hit for entries that
     ## have only one English part, and that English part
     ## having a pl or pn designation.
     load ~/lib/edict
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     highlight on
     word on

     ## The filter for kanjidic will hit for entries without a
     ## frequency-of-use number.  The modify spec will remove
     ## fields with the named initial code (U,N,Q,M,E, and Y)
     load ~/lib/kanjidic
     filter "uncommon" !/<Fd+>/
     modify /( [UNQMEY])+//g

     ## Use the same filter for my local word file,
     ## but turn off by default.
     load ~/lib/local.words
     filter "name" #^[^/]+/[^/]*<p[ln]>[^/]*/$#
     filter off
     highlight on
     word on
     ## Want a tag for my local words, but only when
     ## accessed via the combo below
     tag off "iO"

     combine "words" 2 0
     select words

     ## turn verbosity back on for interactive use.
     verbose on

COMMAND-LINE ARGUMENTS
   With the use of a startup file, command-line arguments are rarely needed.  In practical use, they are only needed to create an index  file,	as
   in:

       lookup -write textfile

   Any	command  line arguments that aren't flags are taken to be files which are loaded in turn during startup.  In this case, anyiEloadiE,iEfil-
   teriE, etc.	commands in the startup file are ignored.

   The following flags are supported:

   -help
      Reports a short help message and exits.

   -write  Creates index files for the named files and exits. No
      startup file is read.

   -euc
      Sets the input and output encoding method to EUC (currently the default).  Exactly the same as theiEencoding euciEcommand.

   -jis
      Sets the input and output encoding method to JIS.  Exactly the same as theiEencoding jisiEcommand.

   -sjis
      Sets the input and output encoding method to Shift-JIS.  Exactly the same as theiEencoding sjisiEcommand.

   -v -version
      Prints the version string and exits.

   -norc
      Indicates that the startup file should not be read.

   -rc file
      The named file is used as the startup file, rather than the defaultiE~/.lookupiE.  It is an error for the file not to exist.

   -percent num
      When an index is built, letters that appear on more than num percent (default 50) of the lines are elided from the index.   The  thought	is
      that  if	a  search  will have to check most of the lines in a file anyway, one may as well save the large amount of space in the index file
      needed to represent that information, and the time/space tradeoff shifts, as the indexing of oft-occurring letters  provides  a  diminishing
      return.

      Smaller indexes can be made by using a smaller number.

   -noindex
      Indicates that any files loaded via the command line should not be loaded with any precomputed index, but recalculated on the fly.

   -verbose
      Has metric tons of stats spewed whenever an index is created.

   -port ###
      For the (undocumented) server configuration only, tells which port to listen on.

OPERATING SYSTEM CONSIDERATIONS
   I/O	primitives and behaviors vary with the operating system. On my operating system, I caniEreadiEa file by mapping it into memory, which is a
   pretty much instant procedure regardless of the size of the file.  When I later access that memory, the appropriate sections of  the  file  are
   automatically read into memory by the operating system as needed.

   This results in lookup starting up and presenting a prompt very quickly, but causes the first few searches that need to check a lot of lines in
   the file to go more slowly (as lots of the file will need to be read in). However, once the bulk of the file is in, searches will go very fast.
   The	win  here  is  that the rather long file-load times are amortized over the first few (or few dozen, depending upon the situation) searches
   rather than always faced right at command startup time.

   On the other hand, on an operating system without the mapping ability, lookup would start up very slowly as all the files and indexes are  read
   into memory, but would then search quickly from the beginning, all the file already having been read.

   To  get  around the slow startup, particularly when many files are loaded, lookup uses lazy loading if it can: a file is not actually read into
   memory at the time the load command is given. Rather, it will be read when first actually accessed.	Furthermore, files are loaded while lookup
   is idle, such as when waiting for user input. See the files command for more information.

REGULAR EXPRESSIONS, A BRIEF TUTORIAL
   Regular  expressions  (iEregexiEfor	short)	are  aiEcodeiEused to indicate what kind of text you're looking for.  They're how one searches for
   things in the editorsiEviiE,iEstevieiE,iEmifesiEetc., or with the grep commands.  There are differences among the various regex flavors in  use
   --  I'll  describe  the  flavor  used  by lookup here. Also, in order to be clear for the common case, I might tell a few lies, but nothing too
   heinous.

   The regexiOaixmeansiEany line with aniAEaiCin it.iE Simple enough.

   The regexiOabixmeansiEany line with aniAEaiCimmediately followed by aiAEbiCiE.  So the line
       I am feeling flabby
   wouldiEmatchiEthe regexiOabixbecause, indeed, there's aniEabiEon that line. But it wouldn't match the line

       this line has no a followed _immediately_ by a b

   because, well, what the lines says is true.

   In most cases, letters and numbers in a regex just mean that you're looking for those letters and numbers in the order  given.  However,  there
   are some special characters used within a regex.

   A  simple  example  would  be  a  period.  Rather  than  indicate  that  you're  looking for a period, it meansiEany characteriE.  So the silly
   regexiO.ixwould meaniEany line that has any character on it.iEWell, maybe not so silly... you can use it to find non-blank lines.

   But more commonly it's used as part of a larger regex. Consider the regexiOgrayix. It wouldn't match the line

       The sky was grey and cloudy.

   because of the different spelling (grey vs. gray).  But the regexiOgr.yixasks foriEany line	with  aiAEgiC,iAEriC,  some  character,  and  then
   aiAEyiCiE.	So  this  would  getiEgreyiEandiEgrayiE.   A special construct somewhat similar toiAE.iCwould be the character class.  A character
   class starts with aiAE[iCand ends with aiAE]iC, and will match any character given in between. An example might be

       gr[ea]y

   which would match lines with aiAEgiC,iAEriC, aniAEeiCor aniAEaiC, and then aiAEyiC.	Inside a character class you can list as  many	characters
   as you want to.

   For example the simple regexiOx[0123456789]yixwould match any line with a digit sandwiched between aniAExiCand aiAEyiC.

   The order of the characters within the character class doesn't really matter...iO[513467289]ixwould be the same asiO[0123456789]ix.

   But	as a short cut, you could putiO[0-9]ixinstead ofiO[0123456789]ix.  So the character classiO[a-z]ixwould match any lower-case letter, while
   the character classiO[a-zA-Z0-9]ixwould match any letter or digit.

   The characteriAE-iCis special within a character class, but only if it's not the first thing. Another character that's special in  a  character
   class isiAE^iC, if it is the first thing. ItiEinvertsiEthe class so that it will match any character not listed. The classiO[^a-zA-Z0-9]ixwould
   match any line with spaces or punctuation on them.

   There are some special short-hand sequences for some common character classes. The sequenceiOdixmeansiEdigitiE, and is the	same  asiO[0-9]ix.
   iOwixmeansiEword elementiEand is the same asiO[0-9a-zA-Z_]ix. iOsixmeansiEspace-type thingiEand is the same asiO[ 	]ix(iO	ixmeans tab).

   You can also useiODix,iOWix, andiOSixto mean things not a digit, word element, or space-type thing.

   Another  special  character would beiAE?iC. This meansiEmaybe one of whatever was just before it, not is fine tooiE.  In the regex iObikes? for
   rentix, theiEwhateveriEwould be theiAEsiC, so this would match lines with eitheriEbikes for rentiEoriEbike for rentiE.

   Parentheses are also special, and can group things together.  In the regex

   big (fat harry)? deal

   theiEwhateveriEfor theiAE?iCwould beiEfat harryiE.  But be careful to pay attention to details... this regex would match
       I don't see what the big fat harry deal is!
   but not
       I don't see what the big deal is!

   That's because if you take away theiEwhateveriEof theiAE?iC, you end up with
       big  deal
   Notice that there are two spaces between the words, and the regex didn't allow for that.  The regex to get either line above would be
       big (fat harry )?deal
   or
       big( fat harry)? deal
   Do you see how they're essentially the same?

   Similar toiAE?iCisiAE*iC, which meansiEany number, including none, of whatever's right in frontiE.  It more or  less  means	that  whatever	is
   tagged withiAE*iCis allowed, but not required, so something like
       I (really )*hate peas
   would matchiEI hate peasiE,iEI really hate peas!iE,iEI really really hate peasiE, etc.

   Similar   to   bothiAE?iCandiAE*iCisiAE+iC,	 which	 meansiEat   least  one  of  whatever  just  in  front,  but  more  is	fine  tooiE.   The
   regexiOmis+pellingixwould matchiEmispellingiE,iEmisspellingiE,iEmissspellingiE, etc. Actually, it's just the  same  asiOmiss*pellingixbut  more
   simple  to  type.  The  regexiOss*ixmeansiEaniAEsiC,  followed by zero or moreiAEsiCiE, whileiOs+ixmeansiEone or moreiAEsiCiE.  Both really the
   same.

   The special characteriAE|iCmeansiEoriE.  UnlikeiAE+iC,iAE*iC, andiAE?iCwhich act on the thing immediately before, theiAE|iCis moreiEglobaliE.
       give me (this|that) one
   Would match lines that hadiEgive me this oneiEoriEgive me that oneiEin them.

   You can even combine more than two:
       give me (this|that|the other) one

   How about:
       [Ii]t is a (nice |sunny |bright |clear )*day

   Here, theiEwhateveriEimmediately before theiAE*iCis
       (nice |sunny |bright |clear )
   So this regex would match all the following lines:
      It is a day.
      I think it is a nice day.
      It is a clear sunny day today.
      If it is a clear sunny nice sunny sunny sunny bright day then....
   Notice how theiO[Ii]tixmatches eitheriEItiEoriEitiE?

   Note that the above regex would also match
      fruit is a day
   because it indeed fulfills all requirements of the regex, even though theiEitiEis really part of the wordiEfruitiE.	To  answer  concerns  like
   this,  which  are  common,  areiAE<iCandiAE>iC,  which  meaniEword  breakiE.   The regexiO<itixwould match any line withiEitiEbeginning a word,
   whileiOit>ixwould match any line withiEitiEending a word.  And, of course,iO<it>ixwould match any line with the wordiEitiEin it.

   Going back to the regex to find grey/gray, that would make more sense, then, as
       <gr[ae]y>
   which would match only the wordsiEgreyiEandiEgrayiE.   Somewhat similar areiAE^iCandiAE$iC, which meaniEbeginning of lineiEandiEend of  lineiE,
   respectively  (but,	not  in  a  character  class,  of  course).   So the regexiO^funixwould find any line that begins with the lettersiEfuniE,
   whileiO^fun>ixwould find any line that begins with the wordiEfuniE.	iO^fun$ixwould find any line that was exactlyiEfuniE.

   Finally,iO^s*funs*$ixwould match any line thatiEfuniEexactly, but perhaps also had leading and/or trailing whitespace.

   That's pretty much it. There are more complex things, some of which I'll mention in the list below, but even with these few	simple	constructs
   one can specify very detailed and complex patterns.

   Let's summarize some of the special things in regular expressions:

   Items that are basic units:
     char      any non-special character matches itself.
     char     special chars, when proceeded by , become non-special.
     .	       Matches any one character (except 
).
     
        Newline
     	        Tab.
     
        Carriage Return.
     f        Formfeed.
     d        Digit. Just a short-hand for [0-9].
     w        Word element. Just a short-hand for [0-9a-zA-Z_].
     s        Whitespace. Just a short-hand for [	 

f].
     ## ###  Two or three digit octal number indicating a single byte.
     [chars]   Matches a character if it's one of the characters listed.
     [^chars]  Matches a character if it's not one of the ones listed.

     The char items above can be used within a character class,
     but not the items below.

     D        Anything not d.
     W        Anything not w.
     S        Anything not s.
     a        Any ASCII character.
     A        Any multibyte character.
     k        Any (not half-width) katakana character (including i1/4).
     K        Any character not k (except 
).
     h        Any hiragana character.
     H        Any character not h (except 
).
     (regex)   Parens make the regex one unit.
     (?:regex)	 [from perl5] Grouping-only parens -- can't use for # (below)
     c        Any JISX0208 kanji (kuten rows 16-84)
     C        Any character not c (except 
).
     #        Match whatever was matched by the #th paren from the left.

   WithiEiuiEto indicate oneiEunitiEas above, the following may be used:

     iu?       A iu allowed, but not required.
     iu+       At least one iu required, but more ok.
     iu*       Any number of iu ok, but none required.

   There are also ways to matchiEsituationsiE:

             A word boundary.
     <	       Same as .
     >	       Same as .
     ^	       Matches the beginning of the line.
     $	       Matches the end of the line.

   Finally, theiEoriEis

     reg1|reg2 Match if either reg1 or reg2 match.

   Note thatiEkiEand the like aren't allowed in character classes, so
   something such asiO[kh]ixto try to get all kana won't work.
   Use iO(k|h)ixinstead.

BUGS
   Needs full support for half-width katakana and JIS X 0212-1990.
   Non-EUC (JIS & SJIS) items not tested well.
   Probably won't work on non-UNIX systems.
   Screen control codes (for clear and highlight commands) are hard-coded for ANSI/VT100/kterm.

AUTHOR
   Jeffrey Friedl (jfriedl@nff.ncl.omron.co.jp)

INFO
   Jim Breen's text files edict and kanjidic and their documentation can be found iniEpub/nihongoiEon ftp.cc.monash.edu.au (130.194.1.106

   Information	 on   input  and  output  encoding  and  codes	can  be  found	in  Ken  Lunde's  Understanding  Japanese  Information	Processing
   (AEuEU,i3/4oEo1/2eIy) published by O'Reilly and Associates.	ISBN 1-56592-043-0.  There is also a Japanese edition published by SoftBank.

   A program to convert files among the various encoding methods is Dr. Ken Lunde'sjconv, which can also be found on ftp.cc.monash.edu.au.   Jconv
   is also useful for converting halfwidth katakana (which lookup doesn't yet support well) to full-width.

																	 LOOKUP(1)
redhat man page for lookup