Unix/Linux Go Back    


Linux 2.6 - man page for file_sorter (linux section 3erl)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)


file_sorter(3erl)		     Erlang Module Definition			file_sorter(3erl)

NAME
       file_sorter - File Sorter

DESCRIPTION
       The  functions  of  this module sort terms on files, merge already sorted files, and check
       files for sortedness. Chunks containing binary terms are read from a  sequence  of  files,
       sorted internally in memory and written on temporary files, which are merged producing one
       sorted file as output. Merging is provided as an optimization; it is faster when the files
       are already sorted, but it always works to sort instead of merge.

       On  a  file, a term is represented by a header and a binary. Two options define the format
       of terms on files:

	 * {header, HeaderLength} . HeaderLength determines the number of  bytes  preceding  each
	   binary  and	containing  the length of the binary in bytes. Default is 4. The order of
	   the header bytes is defined as follows: if B is a binary containing a header only, the
	   size Size of the binary is calculated as <<Size:HeaderLength/unit:8>> = B .

	 * {format,  Format}  . The format determines the function that is applied to binaries in
	   order to create the terms that will be sorted. The  default	value  is  binary_term	,
	   which is equivalent to fun binary_to_term/1 . The value binary is equivalent to fun(X)
	   -> X end , which means that the binaries will be sorted  as	they  are.  This  is  the
	   fastest  format.  If  Format is term , io:read/2 is called to read terms. In that case
	   only the default value of the header option is allowed. The format option also  deter-
	   mines what is written to the sorted output file: if Format is term then io:format/3 is
	   called to write each term, otherwise the binary prefixed by a header is written.  Note
	   that  the binary written is the same binary that was read; the results of applying the
	   Format function are thrown away as soon as the terms have  been  sorted.  Reading  and
	   writing  terms  using the io module is very much slower than reading and writing bina-
	   ries.

       Other options are:

	 * {order, Order} . The default is to sort terms in ascending  order,  but  that  can  be
	   changed  by	the  value descending or by giving an ordering function Fun . An ordering
	   function is antisymmetric, transitive and total. Fun(A, B) should  return  true  if	A
	   comes  before  B  in  the  ordering, false otherwise. An example of a typical ordering
	   function is less than or equal to, =</2 . Using an ordering function  will  slow  down
	   the	sort  considerably.  The  keysort , keymerge and keycheck functions do not accept
	   ordering functions.

	 * {unique, bool()} . When sorting or merging files, only the  first  of  a  sequence  of
	   terms  that compare equal ( == ) is output if this option is set to true . The default
	   value is false which implies that all terms that compare equal are output. When check-
	   ing	files for sortedness, a check that no pair of consecutive terms compares equal is
	   done if this option is set to true .

	 * {tmpdir, TempDirectory} . The directory where temporary files are put  can  be  chosen
	   explicitly.	The  default,  implied by the value "" , is to put temporary files on the
	   same directory as the sorted output file. If output is a  function  (see  below),  the
	   directory returned by file:get_cwd() is used instead. The names of temporary files are
	   derived from the Erlang nodename ( node() ), the process  identifier  of  the  current
	   Erlang  emulator  (	os:getpid()  ),  and a timestamp ( erlang:now() ); a typical name
	   would be fs_mynode@myhost_1763_1043_337000_266005.17 , where 17 is a sequence  number.
	   Existing  files  will be overwritten. Temporary files are deleted unless some uncaught
	   EXIT signal occurs.

	 * {compressed, bool()} . Temporary files and the output  file	may  be  compressed.  The
	   default  value  false implies that written files are not compressed. Regardless of the
	   value of the compressed option, compressed files can always be read. Note that reading
	   and	writing  compressed files is significantly slower than reading and writing uncom-
	   pressed files.

	 * {size, Size} . By default approximately 512*1024 bytes  read  from  files  are  sorted
	   internally. This option should rarely be needed.

	 * {no_files,  NoFiles}  .  By	default 16 files are merged at a time. This option should
	   rarely be needed.

       To summarize, here is the syntax of the options:

	 * Options = [Option] | Option

	 * Option = {header, HeaderLength} | {format, Format} | {order, Order} | {unique, bool()}
	   | {tmpdir, TempDirectory} | {compressed, bool()} | {size, Size} | {no_files, NoFiles}

	 * HeaderLength = int() > 0

	 * Format = binary_term | term | binary | FormatFun

	 * FormatFun = fun(Binary) -> Term

	 * Order = ascending | descending | OrderFun

	 * OrderFun = fun(Term, Term) -> bool()

	 * TempDirectory = "" | file_name()

	 * Size = int() >= 0

	 * NoFiles = int() > 1

       As an alternative to sorting files, a function of one argument can be given as input. When
       called with  the  argument  read  the  function	is  assumed  to  return  end_of_input  or
       {end_of_input,  Value}}	when  there  is  no  more  input  ( Value is explained below), or
       {Objects, Fun} , where Objects is a list of binaries or terms depending on the format  and
       Fun  is a new input function. Any other value is immediately returned as value of the cur-
       rent call to sort or keysort . Each input function will be called exactly once, and should
       an  error  occur, the last function is called with the argument close , the reply of which
       is ignored.

       A function of one argument can be given as output. The results of sorting or  merging  the
       input  is  collected in a non-empty sequence of variable length lists of binaries or terms
       depending on the format. The output function is called with one list at	a  time,  and  is
       assumed to return a new output function. Any other return value is immediately returned as
       value of the current call to the sort or merge function. Each output  function  is  called
       exactly once. When some output function has been applied to all of the results or an error
       occurs, the last function is called with the argument close , and the reply is returned as
       value  of  the current call to the sort or merge function. If a function is given as input
       and the last input function returns {end_of_input, Value} , the function given  as  output
       will  be  called  with  the  argument  {value, Value} . This makes it easy to initiate the
       sequence of output functions with a value calculated by the input functions.

       As an example, consider sorting the terms on a disk log file. A function that reads chunks
       from  the  disk	log and returns a list of binaries is used as input. The results are col-
       lected in a list of terms.

       sort(Log) ->
	   {ok, _} = disk_log:open([{name,Log}, {mode,read_only}]),
	   Input = input(Log, start),
	   Output = output([]),
	   Reply = file_sorter:sort(Input, Output, {format,term}),
	   ok = disk_log:close(Log),
	   Reply.

       input(Log, Cont) ->
	   fun(close) ->
		   ok;
	      (read) ->
		   case disk_log:chunk(Log, Cont) of
		       {error, Reason} ->
			   {error, Reason};
		       {Cont2, Terms} ->
			   {Terms, input(Log, Cont2)};
		       {Cont2, Terms, _Badbytes} ->
			   {Terms, input(Log, Cont2)};
		       eof ->
			   end_of_input
		   end
	   end.

       output(L) ->
	   fun(close) ->
		   lists:append(lists:reverse(L));
	      (Terms) ->
		   output([Terms | L])
	   end.

       Further examples of functions as input  and  output  can  be  found  at	the  end  of  the
       file_sorter module; the term format is implemented with functions.

       The possible values of Reason returned when an error occurs are:

	 * bad_object  ,  {bad_object,	FileName}  . Applying the format function failed for some
	   binary, or the key(s) could not be extracted from some term.

	 * {bad_term, FileName} . io:read/2 failed to read some term.

	 * {file_error, FileName, Reason2} . See file(3erl) for an explanation of Reason2 .

	 * {premature_eof, FileName} . End-of-file was encountered inside some binary term.

       Types

       Binary = binary()
       FileName = file_name()
       FileNames = [FileName]
       ICommand = read | close
       IReply = end_of_input | {end_of_input, Value} | {[Object], Infun} | InputReply
       Infun = fun(ICommand) -> IReply
       Input = FileNames | Infun
       InputReply = Term
       KeyPos = int() > 0 | [int() > 0]
       OCommand = {value, Value} | [Object] | close
       OReply = Outfun | OutputReply
       Object = Term | Binary
       Outfun = fun(OCommand) -> OReply
       Output = FileName | Outfun
       OutputReply = Term
       Term = term()
       Value = Term

EXPORTS
       sort(FileName) -> Reply
       sort(Input, Output) -> Reply
       sort(Input, Output, Options) -> Reply

	      Types  Reply = ok | {error, Reason} | InputReply | OutputReply

	      Sorts terms on files.

	      sort(FileName) is equivalent to sort([FileName], FileName) .

	      sort(Input, Output) is equivalent to sort(Input, Output, []) .

       keysort(KeyPos, FileName) -> Reply
       keysort(KeyPos, Input, Output) -> Reply
       keysort(KeyPos, Input, Output, Options) -> Reply

	      Types  Reply = ok | {error, Reason} | InputReply | OutputReply

	      Sorts tuples on files. The sort is performed on the element(s) mentioned in  KeyPos
	      . If two tuples compare equal ( == ) on one element, next element according to Key-
	      Pos is compared. The sort is stable.

	      keysort(N, FileName) is equivalent to keysort(N, [FileName], FileName) .

	      keysort(N, Input, Output) is equivalent to keysort(N, Input, Output, []) .

       merge(FileNames, Output) -> Reply
       merge(FileNames, Output, Options) -> Reply

	      Types  Reply = ok | {error, Reason} | OutputReply

	      Merges terms on files. Each input file is assumed to be sorted.

	      merge(FileNames, Output) is equivalent to merge(FileNames, Output, []) .

       keymerge(KeyPos, FileNames, Output) -> Reply
       keymerge(KeyPos, FileNames, Output, Options) -> Reply

	      Types  Reply = ok | {error, Reason} | OutputReply

	      Merges tuples on files. Each input file is assumed to be sorted on key(s).

	      keymerge(KeyPos, FileNames, Output) is equivalent  to  keymerge(KeyPos,  FileNames,
	      Output, []) .

       check(FileName) -> Reply
       check(FileNames, Options) -> Reply

	      Types  Reply = {ok, [Result]} | {error, Reason}
		     Result = {FileName, TermPosition, Term}
		     TermPosition = int() > 1

	      Checks  files  for sortedness. If a file is not sorted, the first out-of-order ele-
	      ment is returned. The first term on a file has position 1.

	      check(FileName) is equivalent to check([FileName], []) .

       keycheck(KeyPos, FileName) -> CheckReply
       keycheck(KeyPos, FileNames, Options) -> Reply

	      Types  Reply = {ok, [Result]} | {error, Reason}
		     Result = {FileName, TermPosition, Term}
		     TermPosition = int() > 1

	      Checks files for sortedness. If a file is not sorted, the first  out-of-order  ele-
	      ment is returned. The first term on a file has position 1.

	      keycheck(KeyPos, FileName) is equivalent to keycheck(KeyPos, [FileName], []) .

Ericsson AB				  stdlib 1.17.3 			file_sorter(3erl)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums


All times are GMT -4. The time now is 05:11 AM.