Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

crm(1) [centos man page]

crm(1)                                                                CRM114                                                                crm(1)

      crm - The Controllable Regex Mutilator

      crm [OPTION]... CRMFILE

      This  man  page  is  taken from an older CRM114 version.  It is provided as a convenience to Debian users and may not be up-to-date.  If you
      would like to update it, please send appropriate patches to the Debian bug tracking system.

         -d N (enter debugger after running N cycles. Omitting N means N equals 0.)

         -e (do not import any environment variables)

         -h (print help text)

         -p (generate an execution-time-spent profile on exit)

         -P N (max program lines)

         -q m (mathmode (0,1 = alg/RPN only in EVAL, 2,3 = alg/RPN everywhere))

         -s N (new feature file (.css) size is N (default 1 meg+1 featureslots))

         -S N (new feature file (.css) size is N rounded to 2^I+1 featureslots)

         -t (user trace output)

         -T (implementors trace output (only for the masochistic!))

         -u dir (chdir to directory dir before starting execution)

         -v (print CRM114 version identification and exit)

         -w N (max data window (bytes, default 16 megs))

         -- (signals the end CRM114 flags; prior flags are not seen by the user program; subsequent args are not processed by CRM114)

         --foo (creates the user variable :foo: with the value SET)

         --x=y (creates the user variable :x: with the value y)

         -{ stmts} (execute the statements inside the {} brackets)

         crmfile (.crm file name)

      CRM114 is a language designed to write filters in. It caters to filtering email, system log streams, html, and other marginally  human-read-
      able ASCII that may occasion to grace your computer.

      CRM114's  unique  strengths are the data structure (everything is a string and a string can overlap another string), it's ability to work on
      truly infinitely long input streams, it's ability to use extremely advanced classifiers to sort text, and the ability to do approximate reg-
      ular expressions (that is, regexes that don't quite match) via the TRE regex library.

      CRM114 also sports a very powerful subprocess control facility, and a unique syntax and program structure that puts the fun back in program-
      ming (OK, you can run away screaming now). The syntax is declensional rather than positional; the type of quote  marks  around  an  argument
      determine what that argument will be used for.

      The  typical  CRM114 program uses regex operations more often than addition (in fact, math was only added to TRE in the waning days of 2003,
      well after CRM114 had been in daily use for over a year and a half).

      In other words, crm114 is a very very powerful mutagenic filter that happens to be a programming language as well.

      The filtering style of the CRM-114 discriminator is based on the fact that most spam, normal log file messages, or other uninteresting  data
      is  easily  categorized  by a few characteristic patterns (such as "Mortgage leads", "advertise on the internet", and "mail-order toner car-
      tridges".) CRM114 may also be useful to folks who are on multiple interlocking mailing lists.

      In a bow to Unix-style flexibility, by default CRM114 reads it's input from standard input, and by default sends  it's  output  to  standard
      output.  Note  that  the default action has a zero-length output. Redirection and use of other input or output files is possible, as well as
      the use of windowing, either delimiter-based or time-based, for real-time continuous applications.

      CRM114 can be used for other than mail filtering; consider it to be a version of grep with super powers. If perl is a  seventy-bladed  swiss
      army knife, CRM114 is a razor-sharp katana that can talk.

      Absent  the  -{  program  } flag, the first argument is taken to be the name of a file containing a crm114 program, subsequent arguments are
      merely supplied as :_argN: values. Use single quotes around commandline programs '-{ like this }' to prevent the shell from doing odd things
      to your command-line programs.

      CRM114 can be directly invoked by the shell if the first line of your program file uses the shell standard, as in:

      #! /usr/bin/crm

      You  can use CRM114 flags on the shell-standard invocation line, and hide them with '--' from the program itself; '--' incidentally prevents
      the invoking user from changing any CRM114 invocation flags.

      Flags should be located after any positional variables on the command line. Flags are visible as :_argN: variables, so you can  create  your
      own flags for your own programs (separate CRM114 and user flags with '--').  Two examples on how to do this:

      ./foo.crm bar mugga < baz  -t -w 150000

      ./foo.crm -t -w 1500000 -- bar < baz mugga

      One example on how not to do this:

      ./foo.crm -t -w 150000 bar < baz mugga

      (That's WRONG!)

      You  can  put  a  list  of  user-settable  vars on the #!/usr/bin/crm invocation line. CRM114 will print these out when a program is invoked
      directly (e.g. "./myprog.crm -h", not "crm myprog.crm -h") with the -h (for help) flag. (note that this works ONLY on bash on Linux-  *BSD's
      have a different bash interpretation and this doesn't work)


      #!/usr/bin/crm  -( var1 var2=A var2=B var2=C )

      This allows only var1 and var2 be set on the command line. If a variable is not assigned a value, the user can set any value desired. If the
      variable is equated to a set of values, those are the only values allowed.

      Another example:

      #!/usr/bin/crm  -( var1 var2=foo )  --

      This allows var1 to be set to any value, var2 may only be set to either foo or not at all, and no other variables may be set nor may invoca-
      tion  flags  be  changed  (because  of the trailing "--"). Since "--" also blocks '-h' for help, such programs should provide their own help

      Variable names and locations start with a : , end with a : , and may contain only characters that have ink (i.e. the [:graph:]  class)  with
      few exceptions.

      Examples :here:, :ThErE:, :every-where_0123+45%6789:, :this_is_a_very_very_long_var_name_that_does_not_tell_us_much:.  Builtin variables:

      :_nl:                newline
      :_ht:                horizontal tab
      :_bs:                backspace
      :_sl:                a slash
      :_sc:                a semicolon
      :_arg0: thru :_argN: command-line args, including all flags
      :_argc:              how many command line arguments there were
      :_pos0: thru :_posN: positional args ('-' or '--' args deleted)
      :_posc:              how many positional arguments there were
      :_pos_str:           all positional arguments concatented
      :_env_whatever:      environment value 'whatever'
      :_env_string:        all environmental arguments concatenated
      :_crm_version:       the version of the CRM system
      :_dw:                the current data window contents

      Variables  are expanded by the :*: var-expansion operator, e.g. :*:_nl: expands to a newline character. Uninitialized vars evaluate to their
      text name (and the colons stay).

      You can also use the standard constant C '' characters, such as "
" for newline, as well as excaped hexadecimal and octal characters  like
      xHH and oOOO but these are constants, not variables, and cannot be redefined.

      Depending  on the value of "math mode" (flag -q). you can also use :#:string_or_var: to get the length of a string, and :@:string_or_var: to
      do basic mathematics and inequality testing, either only in EVALs or for all var-expanded expressions. See "Sequence  of  Evaluation"  below
      for more details.

      Default  behavior  is  to  read all of standard input till EOF into the default data window (named :_dw:), then execute the program (this is
      overridden if first executable statement is a WINDOW statement).

      Variables don't get their own storage unless you ISOLATE them (see below), instead  variables  are  start/length  pairs  indexing  into  the
      default  data  window.  Thus, ALTERing an unISOLATEd variable changes the value of the default data buffer itself. This is a great power, so
      use it only for good, and never for evil.

      Statements are separated with a ';' or with a newline.

              '' is the string-text escape character. You only need to escape the  literal  representation  of  closing  delimiters  inside  var-
              expanded arguments.

              You  can  use the classic C/C++ -escapes, such as 
, 	, a, , v, f, , and also xHH and oOOO for hex and octal charac-
              ters, respectively.

              A '' as the last character of a line means the next line is just a continuation of this one.

              A -escape that isn't recognized as something special isn't an error; you may optionally escape any of the delimiters >, ) ] } ; / #
               and get just that character.

              A  ''  anywhere else is just a literal backslash, so the regex ([abc])1 is written just that way; there is no need to double-back-
              slash the 1 (although it will work if you do).
      # this is a comment
      # and this too #
              A comment is not a piece of preprocessor sugar -- it is a statement and ends at the newline or at "#".
      insert filename
              inserts the file verbatim at this line at compile time.
              statement separator - must ALWAYS be escaped as ; unless it's inside delimiters or else it will mark the end of the statement.
      { and }
              start and end blocks of statements. Must always be '' escaped or inside delimiters or these will mark the start/end of a block.
              no-op statement
              define a GOTOable label
              writes the current data window to standard output; execution continues.
              if the last bracket-group succeeded, ALIUS skips to end of {} block (a skip, not a FAIL); if the  prior  group  FAILed,  ALIUS  does
              nothing. Thus, ALIUS is both an ELSE clause and a CASE statement.
      alter (:var:) /new-val/
              destructively  change  value  of  var  to  newval;  (:var:)  is  var to change (var-expanded); /new-val/ is value to change to (var-
      classify <flags> (:c1:...|...:cN:) (:stats:) [:in:] /word-pat/
              compare the statistics of the current data window buffer with classfiles c1...cN.

              <flags>          If <flags> is set to <nocase>, ignore case in word-pat, does not change case in hash (use tr() to do that  on  :in:
                               if you want it).
              (:c1:  ...        file or files to consider "success" files. The CLASSIFY succeeds if these files as a group match best. If not, the
                               CLASSIFY does a FAIL.
              |                optional separator. Spaces on each side of the " | " are required.
              .... :cN:)       optional files to the right of " | " are considered as a group to "fail". If statement fails,  execution  skips  to
                               end of enclosing {..} block, which exits with a FAIL status (see ALIUS for why this is useful).
              (:stats:)        optional var that will get a text formatted matching summary
              [:in:]           restrict statistical measure to the string inside :in:
              /word-pat/       regex to describe what a parseable word is.
      eval (:result:) /instring/
              repeatedly  evaluates  /instring/ until it ceases to change, then places that result as the value of :result: . EVAL uses smart (but
              foolable) heuristics to avoid infinite loops, like evaluating a string that evaluates to a request to  evaluate  itself  again.  The
              error  rate is about 1 / 2^62 and will detect chain groups of length 255 or less.  If the instring uses math evaluation (see section
              below on math operations) and the evaluation has an inequality test, (>, < or =) then if the inequality fails, the EVAL will FAIL to
              the end of block. If the evaluation has a numeric fault (e.g. divide-by-zero) the EVAL will do a TRAPpable FAULT.
      exit /:retval:/
              ends  program  execution.  If supplied, the return value is converted to an integer and returned as the exit code of the crm114 pro-
              gram. If no retval is supplied, the return value is 0.
              skips down to end of the current { } block and causes that block to exit with a FAIL status (see ALIUS for why this is useful)
      fault /faultstr/
              forces a FAULT with the given string as the reason. The fault string is val-expanded.
      goto /:label:/
              unconditional branch (you can use a variable as the goal, e.g. /:*:there:/ )
      hash (:result:) /input/
              compute a fast 32-bit hash of the /input/, and ALTER :result: to the hexadecimal hash value. HASH is not warranted  to  be  constant
              across major releases of CRM114, nor is it cryptographically secure.

              (:result:)       value that gets result.
              /input/          string to be hashed (can contain expanded :*:vars:, defaults to the data window :_dw:)
      intersect (:out:) [:var1: :var2: ...]
              makes  :out:  contain the part of the data window that is the intersection of :var1 :var2: ... ISOLATEd vars are ignored.  This only
              resets the value of the captured variable, and does NOT alter any text in the data window.
      isolate (:var:) /initial-value/
              puts :var: into a data area outside of the data buffer; subsequent changes to this var don't change the data buffer (though they may
              change the value of any var subsequently set inside of this var).  If the var already was ISOLATED, this is a noop.

              (:var:)          name of ISOLATEd var (var-expanded)
              /initial-value/  optional initial value for :var: (var-expanded). If no value is supplied, the previous value is retained/copied.
      input <flags> (:result:) [:filename:]
              read in the content of filename.  If no filename, then read stdin

              <byline>         read one line only
              (:result:)       var that gets the input value
              [:filename:]     the file to read
      learn <flags> (:class:) [:in:] /word-pat/
              learn the statistics of the :in: var (or the input window if no var) as an example of class :class:

              <flags>          can  be  any  of  <nocase>, <refute> and <microgroom>.  <nocase>: ignore case in matching word-pat (does not ignore
                               case in hash- use tr() to do that on :in: if you want it). <refute>: this is an anti-example of this class- unlearn
                               it! <microgroom>: enable the microgroomer to purge less-important information automatically whenever the statistics
                               file gets to crowded.
              (:class:)        name of file holding hashed results; nominal file extension is .css
              [:in:]           captured var containing the text to be learned (if omitted, the full contents of the data window is used)
              /word-pat/       regex that defines a "word". Things that aren't "words" are ignored.
              skips UP to START of the current {} block (LIAF is FAIL spelled backwards)
      match <flags> (:var1: ...) [:in:] /regex/
              Attempt to match the given regex; if match succeds, variables are bound; if match fails, program skips to the closing  '}'  of  this

              <flags>          flags can be any of

                               <abstatement succeeds if match not present
                               <noignore case when matching
                               <frstartrmatch at start of the [:in:] var
                               <frstartrmatch at start of previous successful match on the [:in:] var
                               <frstarttmatch at one character past the start of the previous successful match on the [:in:] var
                               <frstart>match at one character past the end of prev. match on this [:in:] var
                               <nerequire match to end after end of prev. match on this [:in:] var
                               <basearchsbackward in the [:in:] variable from the last successful match.
                               <nodon'tlallow this match to span lines
              (:var1: ...)     optional variables to bind to regex result and '(' ')' subregexes
              [:in:]           search only in the variable specified; if omitted, :_dw: (the full input data window) is used
              /regex/          POSIX regex (with  escapes as needed)
              If you build CRM114 to use the GNU regex library for MATCHing, be warned that GNU REGEX has numerous issues. See the KNOWN_BUGS file
              for a detailed listing.
      output <flags> [filename] /output-text/
              output an arbitrary string with captured values expanded.

              <flags>          <append>: append to the file (otherwise, overwrites)
              [filename]       filename to send output (var-expanded), default output is to stdout
              /output-text/    string to output (var-expanded)
      syscall <flags> (:in:) (:out:) (:status:) /command/
              execute a shell command

              <flags>          can be any of <keep> and <async>. <keep>: keep this process around; if kept, then a syscall with  the  same  :keep:
                               var  will  continue feeding to and reading from the kept proc. <async>: don't wait for process to send an EOF; just
                               grab what's available in the process's output pipe and proceed (limit per syscall is 256 Kbytes)
              (:in:)           var-expanded string to feed to command as input (can be null if you don't want to send the process something.)  You
                               must specify this if you want to specify an :out: variable.
              (:out:)          var-expanded  varname  to  place  results into (must pre-exist, can be null if you don't want to read the process's
                               output (yet, or at all). Limit per syscall is 256 Kbytes. You must specify this if you want  to  use  the  :status:
              (:status:)       if  you  want to keep a minion proc around, or catch the exit status of the process, specify a var here. The minion
                               process's PID and pipes will be stored here. The program can access the proc again with another  syscall  by  using
                               this var again. When the process exits, it's exit code will be stored here.
      trap (:reason:) /trap_regex/
              traps  faults  from  both FAULT statements and program errors occurring anywhere in the preceding bracket-block. If no fault exists,
              TRAP does a SKIP to end of block. If there is a fault and the fault reason string matches the trap_regex, the fault is trapped,  and
              execution continues with the line after the TRAP, otherwise the fault is passed up to the next surrounding trapped bracket block.

              (:reason:)       the  fault  message that caused this FAULT. If it was a user fault, this is the text the user supplied in the FAULT
              /trap_regex/     the regex that determines what kind of faults this TRAP will accept. Putting a wildcard here (e.g. /.*/ means  that
                               ALL faults will be trapped here.
      union (:out:) [:var1: :var2: ...]
              makes  :out:  contain  the union of the data window segments that contains var1, var2... plus any intervening text as well. Any ISO-
              LATEd var is ignored. This is non-surgical, and does not alter the data window
      window <flags> (:w-var:) (:s-var:) /cut-regex/ /add-regex/
              window slider. This deletes to and including the cut-regex from :var: (default: use the data window),  then  reads  adds  from  std.
              input till add-regex (inclusive).

              <flags>          flags can be any of

                               <nocase>         ignore case when matching cut- and add- regexes
                               <bychar>         check input for add-regex every character
                               <byline>         check input for add-regex every line
                               <byeof>          wait for EOF to check for add-regex (extra characters are kept around for later)
                               <eofends>        read lots of input; the input is up to the regex match OR the contents till EOF
              (:w-var:)        what var to window
              (:s-var:)        what var to use for source (defaults to stdin, if you use a source var you must specify the windowed var.
              /cut-regex/      var-expanded cut pattern
              /add-regex/      var-expanded add pattern, if absent reads till EOF
              If both cut-regex and add-regex are omitted, and this window statement is the first executable statement in the program, then CRM114
              does not wait to read a anything from standard input input before starting program execution.

      A regex is a pattern match. Do a "man 7 regex" for details.

      Matches are, by default "first starting point that matches, then longest match possible that can fit".

      a through z
      A through Z
      0 through 9
              all match themselves.
      most punctuation
              matches itself, but check below!
              repeat preceding 0 or more times
              repeat preceding 1 or more times
              repeat preceding 0 or 1 time
      *?, +?, ??
              repeat preceding, but shortest match that fits, given the already-selected start point of the regex. (only supported by  TRE  regex,
              not GNU regex)
              any one of the letters a, b, c, d, or e
              the letters a through q (just one of them)
              repetition count: match the preceding at least n and no more than m times (POSIX restricts this to a maximum of 255 repeats)
              matches at the start of a word (GNU regex only)
              matches the end of a word (GNU regex only)
              as first char of a match, matches the start of a line (ONLY in <nomultiline> matches.
              as last char of a match, matches at the end of a line (ONLY in <nomultiline> matches)
              (a period) matches any single character (except start-of-line or end of line "virtual characters", but it does match a newline).
              match a or b
              the  ()  go  away,  and  the string that matched inside is available for capturing. Use \( and \) to match actual parenthesis (the
              first '' tells "show the second '' to the regex engine, the second '' forces a literalization onto the parenthesis character.

              matches the N'th parenthesized subexpression. Remember to backslash-escape the backslash (e.g. write this as \1) This  is  only  if
              you're using TRE, not GNU regex.
      The following are other POSIX expressions, which mostly do what you'd guess they'd do from their names.


      [[:graph:]]  matches  any character that puts ink on paper or lights a pixel.  [[:print:]] matches any character that moves the "print head"
      or cursor.

      By default, CRM114 supports string length and mathematical evaluation only in an EVAL statement, although it can be set to  allow  these  in
      any  place  where  a var-expanded variable is allowed (see the -q flag).  The default value ( zero ) allows stringlength and math evaluation
      only in EVAL statements, and uses non-precedence (that is, strict left-to-right unless parenthesis are used) algebraic notation. -q  1  uses
      RPN  instead  of  algebraic,  again allowing stringlength and math evaluation only in EVAL expressions. Modes 2 and 3 allow stringlength and
      math evaluation in any var-expanded expression, with non-precedence algebraic notation and RPN notation respectively.  Evaluation is  always
      left-to-right;  there  is  no  precedence  of operators beyond the sequential passes noted below.  The evaluation is done in four sequential

      1  -constants like 
, o377 and x3F are substituted
      2  :*:var: variables are substituted (note the difference between a constant like '
' and a variable like ":*:_nl:" here  -  constants  are
         substituted first, then variables are substituted.)
      3  :#:var: string-length operations are performed
      4  :@:expression:  mathematical  expressions  are  performed;  syntax is either RPN or non-precedenced (parens required) algebraic notation.
         Embedded non-evaluated strings in a mathematical expression is currently a no-no.

         Allowed operators are: + - * / % > < = only.

         Only >, <, and = set logical results; they also evaluate to 1 and 0 for continued chain operations - e.g.

         ((:*:a: > 3) + (:*:b: > 5) + (:*:c: > 9) > 2)

         is true IFF any of the following is true

         o  a > 3 and b > 5
         o  a > 3 and c > 9
         o  b > 5 and c > 9

      Only the TRE engine supports approximate matching. The GNU engine does not support approximate matching.

      Approximate matching is specified similarly to a "repetition count" in a regular regex, using brackets. This approximation  applies  to  the
      previous parenthesized expression (again, just like repetion counts).  You can specify maximum total changes, and how many inserts, deletes,
      and substitutions you wish to allow. The minimum-error match is found and reported, if it exists within the bounds you state.

      The basic syntax is:

      (text-to-match){~[maxerrs] [#maxsubsts] [+maxinserts] [-maxdeletes]}

      Note that the '~' (with an optional maxerr count) is required (that's how we know it's an approximate regex rather than just  a  rep-count);
      if you don't specify a max error count, you will get the best match, if you do, the match will have at most that many errors.

      Remember that you specify the changes to the text in the pattern necessary to make it match the text in the string being searched.

      You cannot use approximate regexes and backrefs (like 1) in the same regex. This is a limitation of in TRE at this point.

      You can also use an inequality in addition to the basic syntax above:

      (text-to-match){~[maxerrs] [basic-syntax] [nI + mD + oS < K] }

      where  n, m, and o are the costs per insertion, deletion, and substitution respectively, 'I', 'D', and 'S' are indicators to tell which cost
      goes with which kind of error, and K is the total cost of the errors; the cost of the errors is always strictly less than K.  Here are  some

              exactly matches "foobar"
              finds the closest match to "foobar", with the minimum number of inserts, deletes, and substitutions. Always succeeds.
              finds the closest match to "foobar", with no more than 3 inserts, deletes, or substitutions
      (foobar){~2 +2 -1 #1)
              find the closest match to "foobar", with at most two errors total, and at most two inserts, one delete, and one substitution.
      (foobar){~4 #1 1i + 2d < 5 }
              find the closest match to "foobar", with at most four errors total, at most one substitution, and with the number of insertions plus
              2x the number of deletions less than 5.
              find the closest match to "foobar", with at most one error in the "foo" and one error in the "bar".

      Here's how to remember what goes where in the CRM114 language.

      Unlike most computer languages, CRM114 uses inflection (or declension) rather than position to describe what role each part of  a  statement
      plays. The declensions are marked by the delimiters- the /, ( and ), < and >, and [ and ].

      By  and  large,  you  can  mix up the arguments to each kind of statement without changing their meaning. Only the ACTION needs to be first.
      Other parts of the statement can occur in any order, save that multiple (paren_args) and /pattern_args/ must stay in their nominal order but
      can go anywhere in the statement. They do not need to be consecutive.

      The parts of a CRM114 statement are:

      ACTION           the verb. This is at the start of the statement.
      /pattern/        the overall pattern the verb should use, analogous to the "subject" of the statement.
      <flags>          modifies how the ACTION does the work. You'd call these "adverbs" in human languages.
      (vars)           what variables to use as adjuncts in the action (what would be called the "direct objects"). These can get changed when the
                       action happens.
      [limited-to]     where the action is allowed to take place (think of it as the "indirect object"). These are not  directly  changed  by  the

      cssmerge(1), cssdiff(1), cssutil(1)

      The CRM114 homepage is at .

      This manpage: $Id: crm114.azm,v 1.12 2004/08/19 11:10:49 vanbaal Exp $

      This manpage describes the crm114 utility as it has been described by QUICKREF.txt, shipped with crm114-20040212-BlameJetlag.src.tar.gz. The
      DESCRIPTION section is copy-and-pasted from INTRO.txt as distributed with the same source tarball.

      Converted from plain ascii to zoem by Joost van Baal.

      Copyright (C) 2001, 2002, 2003, 2004 William S. Yerazunis

      This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License  as  published  by
      the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

      This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABIL-
      ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

      You should have received a  copy  of  the  GNU  General  Public  License  along  with  this  program  (see  COPYING);  if  not,  check  with or write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111, USA.

      William S. Yerazunis. Manpage typesetting by Joost van Baal and Shalendra Chhabra

  crm114 20040816.BlameClockworkOrange-auto.3                         19 Aug 2004                                                             crm(1)
Man Page