Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

detox(1) [debian man page]

DETOX(1)						    BSD General Commands Manual 						  DETOX(1)

NAME
detox -- clean up filenames SYNOPSIS
detox [-hnLrv] [-s -sequence] [-f -configfile] [--dry-run] [--special] file ... DESCRIPTION
The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It'll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters. Sequences detox is driven by a configurable series of filters, called a sequence. Sequences are covered in more detail in detoxrc(5) and are discover- able with the -L option. Some examples of default sequences are iso8859_1 and utf_8. Options The main options: -f configfile Use configfile instead of the default configuration files for loading translation sequences. No other config file will be parsed. -h --help Display helpful information. -L List the currently available sequences. When paired with -v this option shows what filters are used in each sequence and any properties applied to the filters. -n --dry-run Doesn't actually change anything. This implies the -v option. -r Recurse into subdirectories. -s sequence Use sequence instead of default. --special Works on special files (including links). Normally detox ignores these files. -v Be verbose about which files are being renamed. -V Show the current version of detox. Deprecated Options Deprecated Options are options that were available in earlier versions of detox but have lost their meaning and are being phased out. --remove-trailing Removes _ and - after .'s in filenames. This was first provided in the 0.9 series of detox. After the introduction of sequences, it lost its meaning, as you could now determine the properties of wipeup through a particular sequence's configura- tion. It presently forces all instances of the wipeup filter to use remove trailing, regardless of what's actually in the config files. FILES
detoxrc The system-wide detoxrc file. ~/.detoxrc A user's personal detoxrc. Normally it extends the system-wide detoxrc, unless -f has been specified, in which case, it is ignored. iso8859_1.tbl The default ISO 8859-1 translation table. unicode.tbl The default Unicode (UTF-8) translation table. EXAMPLES
detox -s iso8859_1 -r -v -n /tmp/new_files Will run the sequence iso8859_1 recursively, listing any changes, without changing anything, on the files of /tmp/new_files. detox -c my_detoxrc -L -v Will list the sequences within my_detoxrc, showing their filters and options. SEE ALSO
detoxrc(5), detox.tbl(5). HISTORY
detox was originally designed to clean up files that I had received from friends which had been created using other operating systems. It's trivial to create a filename with spaces, parenthesis, brackets, and ampersands under some operating systems. These have special meaning within FreeBSD and Linux, and cause problems when you go to access them. I created detox to clean up these files. AUTHORS
detox was written by Doug Harple. BUGS
If, after the translation of a filename is finished, a file already exists with that same name, detox will not rename the file. This could cause a problem with the max_length filter, if it was imperative that the files be cut down to a certain length. Long options don't work under Solaris or Darwin. An error in the config file will cause a segfault as it's going to print the offending word within the config file. BSD
August 3, 2004 BSD

Check Out this Related Man Page

DETOXRC(5)						      BSD File Formats Manual							DETOXRC(5)

NAME
detoxrc -- configuration file for detox(1) OVERVIEW
detox allows for configuration of its sequences through config files. This document describes how these files work. IMPORTANT
When setting up a new set of rules, the safe and wipeup filters must always be run after a translating filter (or series thereof), such as the utf_8 or the uncgi filters. Otherwise, the risk of introducing illegal characters into the filename is introduced. SYNTAX
The format of this configuration file is C-like. It is based loosely off named's configuration files. Each statement is semicolon termi- nated, and modifiers on a particular statement are generally contained within braces. sequence "name" {...}; Defines a sequence of filters to run a filename through. "name" specifies how the user will refer to the particular sequence during run- time. Quotes around the sequence name are generally optional, but should be used if the sequence name does not start with a letter. There is a special sequence, named "default", which is the default sequence used by detox. This can be overridden through the command line option -s or the environmental variable DETOX_SEQUENCE. Sequence names are case sensitive and unique throughout all sequences; that is, if a system wide file defines normal_seq and a user has a sequence with the same name in their .detoxrc, the users' normal_seq will take precedence. iso8859_1 {filename "/path/to/filename";}; This translates ISO 8859-1 (aka Latin-1) characters into lower ASCII equivalents. The output is not necessarily safe, and should also be run through the safe filter. Under normal circumstances, the filename syntax is not needed. Detox looks in several locations for a file called iso8859_1.tbl, which is a set of rules defining how an ISO 8859-1 character should be translated. In the event this table doesn't exist, you have two options. You can download or create your own, and tell detox the location of it using the filename syntax shown above, or you can let detox fall back on its internal tables. The internal tables translate the same as the stock translation tables. You can chain together multiple iso8859_1 translations, as long as the default value of all but the last one is set to nothing. This is explained in detox.tbl(5). This filter is mutually exclusive with the utf_8 filter. utf_8 {filename "/path/to/filename";}; This translates Unicode characters, encoded by the UTF-8 translation method, into safe equivalents. This operates in a manner similar to iso8859_1, except it looks for a translation table called unicode.tbl. The default internal translation for Unicode characters only contains the lower 256 characters of Unicode, which is equivalent to the set of Basic Latin and Latin-1 characters. uncgi; This translates CGI escaped strings into their ASCII equivalents. The output of this is not necessarily safe, and could contain ISO 8859-1 chars or potentially UTF-8 characters. safe {filename "/path/to/filename";}; This could also be called "safe for UNIX-like operating systems". It translates characters that are difficult to work with in UNIX envi- ronments into characters that are not. In earlier versions this filter was entirely internal. Starting with 1.2.0, this filter is controlled by a translation table. In the absense of the translation table, the previous code will be employed for the translation. Also, prior to 1.2.0, the safe filter removed leading dashes to prevent the hassle of dealing with a filename in the format -filename. This functionality is exclusively handled by the wipeup filter now. See the SAFE section for more details on what this filter translates by default. wipeup {remove_trailing;}; This wipes up any excessive characters. For instance, multiple underscores or dashes will be converted into a single underscore or dash. Any series of dash and underscore (i.e. "_-_") will be converted into a single dash. The remove trailing option removes a dash or underscore followed immediately by a period. See the WIPEUP section for more details on what this filter translates. max_length {length value;}; This trims a file down to the length specified (or less). It is conscious of extensions and attempts to preserve anything following the last period in a filename. For instance, given a max length of 12, and a filename of "this_is_my_file.txt", the filter would output "this_is_.txt". lower; This translates uppercase characters into lowercase characters. # Comments Any thing after a # on any line is ignored. EXAMPLE
sequence default { uncgi; iso8859_1 { filename "iso8859_1.tbl"; }; # utf_8 { # filename "unicode.tbl"; # }; safe { filename "safe.tbl"; }; wipeup { remove_trailing; }; # max_length { # length 128; # }; }; SAFE
The following characters are translated by the stock safe filter. They can be tuned by updating safe.tbl or creating a copy of safe.tbl and updating your rc file. Rules that apply anywhere in the filename: Safe Original _and_ & _ space ` ! @ $ * | : ; " ' < > ? / - ( ) [ ] { } WIPEUP
The following characters are translated by the wipeup filter. Rules that apply anywhere in the filename: Wipeup Original - -_ - _- - -- _ __ Rules that apply only at the beginning of a filename: Any leading dashes are stripped to prevent programs from interpreting these files as command line options. Wipeup Original removed - _ # Rules that apply when remove trailing is enabled: Wipeup Original . .- . -. . ._ . _. SEE ALSO
detox(1), detox.tbl(5). AUTHORS
detox was written by Doug Harple. BSD
August 3, 2004 BSD
Man Page