sort(1) [osf1 man page]

sort(1) 						      General Commands Manual							   sort(1)

NAME

       sort - Sorts or merges files

SYNOPSIS

       sort [-m] [-o output_file] [-Abdfinru] [-k keydef]... [-t character] [-T directory] [-y] [kilobytes] [-z record_size]... file...

       sort -c	[-u] [-Abdfinru] [-k keydef]... [-t character] [-T directory] [-y] [kilobytes] [-z record_size]... file...

       The following older syntax is now maintained for backward compatibility, but may be withdrawn in future issues: sort [-Abcdfimnru] [-o out-
       put_file] [-t character] [-T directory] [-y] [kilobytes] [-z record_size] [+fskip] [.cskip] [-fskip] [.cskip] [-bdfinr]... file...

STANDARDS

       Interfaces documented on this reference page conform to industry standards as follows:

       sort:  XCU5.0

       Refer to the standards(5) reference page for more information about industry standards and associated tags.

OPTIONS

       The -d, -f, -i, -n, and -r options override the default ordering rules.	When ordering options appear independent of any key field specifi-
       cations, the requested field ordering rules are applied globally to all sort keys.  When attached to a specific key (see -k), the specified
       ordering options override all global ordering options for that key.  In the obsolescent forms, if one or more of these  options	follows  a
       +fskip  option,	it  affects only the key field specified by that preceding option.  [Tru64 UNIX]  Sorts on a byte-by-byte basis using each
       character's encoded value.  On some systems, extended characters will be considered negative values, and so sort before	ASCII  characters.
       If you are sorting ASCII characters in a non-C/POSIX locale, this option performs much faster.  Ignores leading spaces and tabs when deter-
       mining the starting and ending positions of a restricted sort key.  If the -b option is specified before the first -k option, the -b option
       is  applied  to	all  -k  options  on  the  command  line; otherwise, the -b option can be independently attached to each -k field_start or
       field_end argument.  Checks that the input is sorted according to the ordering rules specified in the options and the collating sequence of
       the  current  locale.   No  output  is  produced;  only	the exit code is affected.  Specifies that only spaces and alphanumeric characters
       (according to the current setting of LC_TYPE) are significant in comparisons.  Treats all lowercase characters as their	uppercase  equiva-
       lents  (according  to the current setting of LC_TYPE) for the purposes of comparison.  Sorts only by printable characters (according to the
       current setting of LC_TYPE).  Specifies one or more (up to 50) restricted sort key field definitions.  This option replaces the obsolescent
       +fskip.cskip  and  -fskip.cskip	options.  A  field comprises a maximal sequence of non-separating characters and, in the absence of the -t
       option, any preceding field separator.

	      The format of a key field definition is as follows: field_start[type][,field_end[type]]

	      The field_start and field_end arguments define a key field that is restricted to a portion of the line, and type is a modifier spec-
	      ified  by b, d, f, i, n, r, or t.  The b modifier behaves like the -b option, but applies only to the field_start or field_end argu-
	      ment to which it is attached.  The t modifier indicates that the key field is processed as CPU time. The other modifiers behave like
	      their  corresponding options, but apply only to the key field to which they are attached; these modifiers have this effect if speci-
	      fied with field_start, field_end or both.

	      Modifiers attached to a field_start or field_end argument override any specifications made by  the  options.   A	missing  field_end
	      argument	means the last character of the line.  When multiple sort keys are specified, it is advisable to specify a field_end argu-
	      ment to avoid possible confusion.

	      The field_start portion of the keydef argument takes the following form: field_number[.first_character]

	      Fields and characters within fields are numbered starting with 1. The field_number and first_character pieces, interpreted as  posi-
	      tive  decimal integers, specify the character to be used as part of a sort key.  If first_character is not specified, the default is
	      the first character of the field.

	      The field_end portion of the keydef argument takes the following form: field_number[.last_character]

	      The field_number syntax is the same as that described for field_start.  The last_character argument, interpreted	as  a  nonnegative
	      decimal integer, specifies the last character to be used as part of the sort key.  If last_character evaluates to 0 (zero) or is not
	      specified, the default is the last character of the field specified by field_number.

	      If -b is in effect, characters within a field are counted from the first nonspace character in the field.  (This applies	separately
	      to first_character and last_character.)

	      If -k is not specified, the default sort key is the entire line.

	      When there are multiple key fields, later keys are compared only after all earlier keys compare as equal.  Except when the -u option
	      is specified, lines that otherwise compare as equal are ordered as though none of the options -d, -f, -i, -n,  or  -k  were  present
	      (but with -r still in effect, if it was specified) and with all bytes in the lines significant to the comparison.

	      The algorithm for the -k option can be summarized as follows:

	      /*
	       * -ka.b,c.d = if d==0 then +(a-1).(b-1) -c.d
	       *	      else +(a-1).(b-1) -(c-1).d
	       */  Merges  only  (assumes  sorted input).  Sorts any initial numeric strings (including regular expressions consisting of optional
	      spaces, optional dashes, and zero (0) or more digits with optional radix character and thousands separator, as defined by  the  cur-
	      rent  locale)  by arithmetic value.  An empty digit string is treated as zero; leading zeros and signs on zeros do not affect order-
	      ing.  Only one period (.) can be used in numeric strings.  All subsequent periods (.) and any character to the right of  the  period
	      (.) will be ignored.  Directs output to output_file instead of standard output.  The output_file can be the same as one of the input
	      files.  Reverses the order of the specified sort.  Sets the field separator character to character. The character  argument  is  not
	      considered  to  be  part	of  a field (although it can be included in a sort key).  Each occurrence of character is significant (for
	      example, two consecutive occurrences of character delimit an empty field).  To specify the tab character as the field separator, you
	      must enclose it in ' ' (single quotes).

	      The  default  field  separator  is  one or more spaces.  [Tru64 UNIX]  Places all the temporary files that are created in directory.
	      Suppresses all but one in each set of equal lines (for example, lines whose sort keys match exactly).  Ignored  characters  such	as
	      leading tabs and spaces, and characters outside of sort keys are not considered in this type of comparison.

	      If  used	with  the -c option, -u checks that there are no lines with duplicate keys, in addition to checking that the input file is
	      sorted.  [Tru64 UNIX]  Starts the sort command using kilobytes of main storage and adds storage as needed.  (If  kilobytes  is  less
	      than  the  minimum  storage size or greater than the maximum, the minimum or maximum is used instead.)  If the -y option is omitted,
	      the sort command starts with the default storage size; -y 0 starts with minimum storage, and -y (with no value) starts with the max-
	      imum  storage.   The  amount of storage used by the sort command has a significant impact on performance.  Sorting a small file in a
	      large amount of storage is wasteful.  Prevents abnormal termination if lines being sorted are longer than the  default  buffer  size
	      can  handle.   When  the	-c or -m options are specified, the sorting phase is omitted and a system default size buffer is used.	If
	      sorted lines are longer than this size, sort terminates abnormally.  The -z option specifies that the longest line  be  recorded	in
	      the  sort  phase	so  that  adequate buffers can be allocated in the merge phase.  The record_size argument must be a value in bytes
	      equal to or greater than the number of bytes in the longest line to be merged.  Specifies the start position of a  key  field.   See
	      the -k option for a description of the current way to perform this operation.  (Obsolescent)

	      The fskip variable specifies the number of fields to skip from the beginning of the input line, and the cskip variable specifies the
	      number of additional characters to skip to the right beyond that point.  For both the starting point (+fskip.cskip) and  the  ending
	      point  (-fskip.cskip)  of  a  sort  key, fskip is measured from the beginning of the input line, and cskip is measured from the last
	      field skipped.  If you omit assumed.   If  you  omit  fskip,  0  (zero)  is  assumed.   If  you  omit  the  ending  field  specifier
	      (-fskip.cskip), the end of the line is the end of the sort key.

	      You  can	supply	more  than one sort key by repeating +fskip.cskip and -fskip.cskip.  In cases where you specify more than one sort
	      key, keys specified further to the right on the command line are compared only after all earlier keys are sorted.  For  example,	if
	      the first key is to be sorted in numerical order and the second according to the collating sequence, all strings that start with the
	      number 1 are sorted according to the collating order before the strings that start with the number 2.  Lines that are  identical	in
	      all  keys  are  sorted  with all characters significant.	You can also specify different options for different sort keys in multiple
	      sort keys.  Specifies the end position of a key field.  See the -k option for a description of the current way to perform this oper-
	      ation.  (Obsolescent)

DESCRIPTION

       The sort command sorts lines in its input files and writes the result to standard output.

       The  sort  command performs one of the following functions: Sorts lines of all the named files together and writes the result to the speci-
       fied output.  Merges lines of all the named (presorted) files together and writes the result to the specified output.  Checks that a single
       input file is correctly presorted.

       Comparisons  are  based	on one or more sort keys extracted from each line of input (or the entire line if no sort keys are specified), and
       are performed using the collating sequence of the current locale.

       The sort command treats all of its input files as one file when it performs the sort.  A - (dash) in place of a file name  specifies  stan-
       dard input.  If you do not specify a file name, it sorts standard input.

       The sort command can handle a variety of collation rules typically used in Western European languages, including primary/secondary sorting,
       one-to-two character mapping, N-to-one character mapping, and ignore-character mapping.	To summarize briefly:

   Primary/Secondary Sorting
       In this system, a group of characters all sort to the same primary location.  If there is a tie, a secondary sort is applied.  For example,
       in French, the plain and accented a's all sort to the same primary location.  If two strings collate to the same primary location, the sec-
       ondary sort goes into effect.  These words are in correct French order:

       abord pre aprs pret azur

   One-to-Two Character Mappings
       This system requires that certain single characters be treated as if they were two characters.  For example, in German,	the   (scharfes-S)
       is collated as if it were ss.

   N-to-One Character Mappings
       Some  languages	treat a string of characters as if it were one single collating element.  For example, in Spanish, the ch and ll sequences
       are treated as their own elements within the alphabet.  (ch comes between c and d in the alphabet, and ll comes between l and m.)

   Ignore-Character Mappings
       In some cases, certain characters may be ignored in collation.  For example, if - were defined as  an  ignore-character,  the  strings  re-
       locate  and  relocate  would  sort to the same place. The results that you get from sort depend on the collating sequence as defined by the
       current setting of the LC_COLLATE environment variable.	The configuration files for collation and character classification information are
       /usr/lib/nls/loc/src/locale.src.  A  field is one or more characters bounded by the beginning of a line and the current field separator, or
       one or more characters bounded by a field separator on either side.  The space character is the default field separator. Lines longer  than
       1024 bytes are truncated by sort.  The maximum number of fields on a line is 50.

EXIT STATUS

       The  sort  command  returns the following exit values: All input files were output successfully, or -c was specified and the input file was
       correctly sorted.  Under the -c option, the file was not ordered as specified, or if the -c and -u options were both specified,	two  input
       lines were found with equal keys.  An error occurred.

EXAMPLES

       The following examples apply to the C locale, unless it is specifically stated otherwise.  To perform a simple sort, enter: sort fruits

	      This  displays  the  contents  of fruits sorted in ascending lexicographic order.  This means that the characters in each column are
	      compared one by one, including spaces, digits, and special characters.

	      For instance, if fruits contains the text:

	      banana orange Persimmon apple %%banana apple ORANGE

	      Then sort fruits displays: %%banana ORANGE Persimmon apple apple banana orange

	      This order follows from the fact that in the ASCII collating sequence, symbols (such as %) precede uppercase letters, and all upper-
	      case  letters  precede the lowercase letters. If you are using a different collating order, your results may be different.  To group
	      lines that contain uppercase and special characters with similar lowercase lines, and remove duplicate lines, enter: sort -d  -f	-u
	      fruits

	      The  -u option tells sort to remove duplicate lines, making each line of the file unique.  This displays: apple %%banana orange Per-
	      simmon

	      Not only was the duplicate apple removed, but banana and ORANGE were removed as well. The -d option told sort to ignore symbols,	so
	      %%banana	and  banana  were  considered  to be duplicate lines and banana was removed.  The -f option told sort not to differentiate
	      between uppercase and lowercase, so ORANGE and orange were considered to be duplicate lines and ORANGE was removed.

	      When the -u option is used with input that contains nonidentical lines that are considered by sort (due  to  other  options)  to	be
	      duplicates,  there  is  no  way to predict which lines sort will keep and which it will remove.  To sort as in Example 2, but remove
	      duplicates unless capitalized or punctuated differently, enter: sort -u -k 1df -k 1 fruits

	      Options appearing between sort key specifiers apply only to the specifier preceding them.  There are two	sorts  specified  in  this
	      command  line.  The  -k  1df  argument  specifies the first sort, of the same type done with -d -f in Example 3.	Then -k 1 performs
	      another comparison to distinguish lines that are not actually identical.	This prevents -u, which applies to both sorts  because	it
	      precedes the first sort key specifier, from removing lines that are not exactly identical to other lines.

	      Given the fruits file shown in Example 1, the added -k 1 distinguishes %%banana from banana and ORANGE from orange. However, the two
	      instances of apple are exactly identical, so one of them is deleted.  apple %%banana banana ORANGE orange Persimmon To specify a new
	      field separator, enter: sort -t : -k 2 vegetables

	      This  sorts  vegetables, comparing the text that follows the first colon on each line.  The -t : option tells sort that colons sepa-
	      rate fields. The -k 2 argument tells sort to ignore the first field and to compare from the start of the second field to the end	of
	      the line.  If vegetables contains:

	      yams:104 turnips:8 potatoes:15 carrots:104 green beans:32 radishes:5 lettuce:15

	      then sort -t : -k 2 vegetables displays: carrots:104 yams:104 lettuce:15 potatoes:15 green beans:32 radishes:5 turnips:8

	      The  numbers  are not in ascending order. This is because a lexicographic sort compares each character from left to right.  In other
	      words, 3 comes before 5 so 32 comes before 5.  To sort on more than one field, enter: sort -t : -k 2n -k 1r vegetables

	      This performs a numeric sort on the second field (-k 2n) and then, within that ordering, sorts the first field in reverse  collating
	      order (-k 1r).  The output looks like this: radishes:5 turnips:8 potatoes:15 lettuce:15 green beans:32 yams:104 carrots:104

	      The  lines are sorted in numeric order; when two lines have the same number, they appear in reverse collating order.  To replace the
	      original file with the sorted text, enter: sort -o vegetables vegetables

	      The -o vegetables option stores the sorted output into the file vegetables.  To collate using Spanish rules, set the LC_COLLATE  (or
	      LANG) environment variable to a Spanish locale, and then use sort in the regular way, enter: sort sp.words

	      If an input file named sp.words contains the following Spanish words:

	      dama loro chapa canto mover chocolate curioso llanura

	      The sorted file looks like this: canto curioso chapa chocolate dama loro llanura mover

	      If you sort the file in the default C locale, the output looks like this: canto chapa chocolate curioso dama llanura loro mover

ENVIRONMENT VARIABLES

       The  following environment variables affect the execution of sort: Provides a default value for the internationalization variables that are
       unset or null. If LANG is unset or null, the corresponding value from the default locale is used.  If any of the internationalization vari-
       ables  contain  an  invalid setting, the utility behaves as if none of the variables had been defined.  If set to a non-empty string value,
       overrides the values of all the other internationalization variables.  Determines the locale for the interpretation of sequences  of  bytes
       of text data as characters (for example, single-byte as opposed to multibyte characters in arguments) and the behavior of character classi-
       fication for the -b, -d, -f, -i, and -n options.  Determines the locale for the format and contents of diagnostic messages written to stan-
       dard error.  Determines the location of message catalogues for the processing of LC_MESSAGES.

FILES

       Configuration files

SEE ALSO

       Commands:  comm(1), join(1), uniq(1)

       Functions:  setlocale(3), tolower(3)

       Files:  locale(4)

       Standards:  standards(5)

																	   sort(1)
Linux and UNIX Man Pages

sort(1) [osf1 man page]