Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

detox(1) [debian man page]

DETOX(1)						    BSD General Commands Manual 						  DETOX(1)

NAME
detox -- clean up filenames SYNOPSIS
detox [-hnLrv] [-s -sequence] [-f -configfile] [--dry-run] [--special] file ... DESCRIPTION
The detox utility renames files to make them easier to work with. It removes spaces and other such annoyances. It'll also translate or cleanup Latin-1 (ISO 8859-1) characters encoded in 8-bit ASCII, Unicode characters encoded in UTF-8, and CGI escaped characters. Sequences detox is driven by a configurable series of filters, called a sequence. Sequences are covered in more detail in detoxrc(5) and are discover- able with the -L option. Some examples of default sequences are iso8859_1 and utf_8. Options The main options: -f configfile Use configfile instead of the default configuration files for loading translation sequences. No other config file will be parsed. -h --help Display helpful information. -L List the currently available sequences. When paired with -v this option shows what filters are used in each sequence and any properties applied to the filters. -n --dry-run Doesn't actually change anything. This implies the -v option. -r Recurse into subdirectories. -s sequence Use sequence instead of default. --special Works on special files (including links). Normally detox ignores these files. -v Be verbose about which files are being renamed. -V Show the current version of detox. Deprecated Options Deprecated Options are options that were available in earlier versions of detox but have lost their meaning and are being phased out. --remove-trailing Removes _ and - after .'s in filenames. This was first provided in the 0.9 series of detox. After the introduction of sequences, it lost its meaning, as you could now determine the properties of wipeup through a particular sequence's configura- tion. It presently forces all instances of the wipeup filter to use remove trailing, regardless of what's actually in the config files. FILES
detoxrc The system-wide detoxrc file. ~/.detoxrc A user's personal detoxrc. Normally it extends the system-wide detoxrc, unless -f has been specified, in which case, it is ignored. iso8859_1.tbl The default ISO 8859-1 translation table. unicode.tbl The default Unicode (UTF-8) translation table. EXAMPLES
detox -s iso8859_1 -r -v -n /tmp/new_files Will run the sequence iso8859_1 recursively, listing any changes, without changing anything, on the files of /tmp/new_files. detox -c my_detoxrc -L -v Will list the sequences within my_detoxrc, showing their filters and options. SEE ALSO
detoxrc(5), detox.tbl(5). HISTORY
detox was originally designed to clean up files that I had received from friends which had been created using other operating systems. It's trivial to create a filename with spaces, parenthesis, brackets, and ampersands under some operating systems. These have special meaning within FreeBSD and Linux, and cause problems when you go to access them. I created detox to clean up these files. AUTHORS
detox was written by Doug Harple. BUGS
If, after the translation of a filename is finished, a file already exists with that same name, detox will not rename the file. This could cause a problem with the max_length filter, if it was imperative that the files be cut down to a certain length. Long options don't work under Solaris or Darwin. An error in the config file will cause a segfault as it's going to print the offending word within the config file. BSD
August 3, 2004 BSD

Check Out this Related Man Page

UTF(6)								   Games Manual 							    UTF(6)

NAME
UTF, Unicode, ASCII, rune - character set and format DESCRIPTION
The Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character Set Transformation Format, 8 bits wide). The Unicode Standard represents its characters in 16 bits; UTF-8 represents such values in an 8-bit byte stream. Throughout this manual, UTF-8 is shortened to UTF. In Plan 9, a rune is a 16-bit quantity representing a Unicode character. Internally, programs may store characters as runes. However, any external manifestation of textual information, in files or at the interface between programs, uses a machine-independent, byte-stream encoding called UTF. UTF is designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding. Runes with values above 7F appear as sequences of two or more bytes with values only from 80 to FF. The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even if not written to deal with UTF, as do programs that deal with uninterpreted byte streams. However, programs that perform semantic processing on ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input. See rune(2). Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows: 01. x in [00000000.0bbbbbbb] -> 0bbbbbbb 10. x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb 11. x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way. Conversions 10 and 11 represent higher- valued characters as sequences of two or three bytes with the high bit set. Plan 9 does not support the 4, 5, and 6 byte sequences pro- posed by X-Open. When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used. In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080. FILES
/lib/unicode table of characters and descriptions, suitable for look(1). SEE ALSO
ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard. UTF(6)
Man Page