bup-split(1) debian man page

bup-split(1)						      General Commands Manual						      bup-split(1)

NAME
       bup-split - save individual files to bup backup sets

SYNOPSIS
       bup  split  [-r	host:path]  <-b|-t|-c|-n  name>  [-v] [-q] [--bench] [--max-pack-size=bytes] [-#] [--max-pack-objects=n] [--fanout=*count]
       [--git-ids] [--keep-boundaries] [filenames...]

DESCRIPTION
       bup split concatenates the contents of the given files (or if no filenames are given, reads from stdin), splits the content into chunks	of
       around 8k using a rolling checksum algorithm, and saves the chunks into a bup repository.  Chunks which have previously been stored are not
       stored again (ie.  they are 'deduplicated').

       Because of the way the rolling checksum works, chunks tend to be very stable across changes to a given file,  including	adding,  deleting,
       and changing bytes.

       For  example, if you use bup split to back up an XML dump of a database, and the XML file changes slightly from one run to the next, nearly
       all the data will still be deduplicated and the size of each backup after the first will typically be quite small.

       Another technique is to pipe the output of the tar(1) or cpio(1) programs to bup split.	 When  individual  files  in  the  tarball  change
       slightly  or  are  added  or  removed, bup still processes the remainder of the tarball efficiently.  (Note that bup save is usually a more
       efficient way to accomplish this, however.)

       To get the data back, use bup-join(1).

OPTIONS
       -r, --remote=host:path
	      save the backup set to the given remote server.  If path is omitted, uses the default path on the remote server (you still  need	to
	      include  the  ':').  The connection to the remote server is made with SSH.  If you'd like to specify which port, user or private key
	      to use for the SSH connection, we recommend you use the ~/.ssh/config file.

       -b, --blobs
	      output a series of git blob ids that correspond to the chunks in the dataset.

       -t, --tree
	      output the git tree id of the resulting dataset.

       -c, --commit
	      output the git commit id of the resulting dataset.

       -n, --name=name
	      after creating the dataset, create a git branch named name so that it can be accessed using that name.  If name already exists,  the
	      new dataset will be considered a descendant of the old name.  (Thus, you can continually create new datasets with the same name, and
	      later view the history of that dataset to see how it has changed over time.)

       -q, --quiet
	      disable progress messages.

       -v, --verbose
	      increase verbosity (can be used more than once).

       --git-ids
	      stdin is a list of git object ids instead of raw data.  bup split will read the contents of each named git object (if it	exists	in
	      the  bup	repository)  and  split it.  This might be useful for converting a git repository with large binary files to use bup-style
	      hashsplitting instead.  This option is probably most useful when combined with --keep-boundaries.

       --keep-boundaries
	      if multiple filenames are given on the command line, they are normally concatenated together as if the content all came from a  sin-
	      gle  file.   That is, the set of blobs/trees produced is identical to what it would have been if there had been a single input file.
	      However, if you use --keep-boundaries, each file is split separately.  You still only get a single  tree	or  commit  or	series	of
	      blobs, but each blob comes from only one of the files; the end of one of the input files always ends a blob.

       --noop read  the  data  and split it into blocks based on the "bupsplit" rolling checksum algorithm, but don't do anything with the blocks.
	      This is mostly useful for benchmarking.

       --copy like --noop, but also write the data to stdout.  This can be useful for benchmarking the	speed  of  read+bupsplit+write	for  large
	      amounts of data.

       --bench
	      print benchmark timings to stderr.

       --max-pack-size=bytes
	      never create git packfiles larger than the given number of bytes.  Default is 1 billion bytes.  Usually there is no reason to change
	      this.

       --max-pack-objects=numobjs
	      never create git packfiles with more than the given number of objects.  Default is 200 thousand objects.	Usually there is no reason
	      to change this.

       --fanout=numobjs
	      when  splitting  very  large files, never put more than this number of git blobs in a single git tree.  Instead, generate a new tree
	      and link to that.  Default is 4096 objects per tree.

       --bwlimit=bytes/sec
	      don't transmit more than bytes/sec bytes per second to the server.  This is good for making your backups not suck up all	your  net-
	      work bandwidth.  Use a suffix like k, M, or G to specify multiples of 1024, 10241024, 10241024*1024 respectively.

       -#, --compress=#
	      set  the	compression  level to # (a value from 0-9, where 9 is the highest and 0 is no compression).  The default is 1 (fast, loose
	      compression)

EXAMPLE
	      $ tar -cf - /etc | bup split -r myserver: -n mybackup-tar
	      tar: Removing leading /' from member names
	      Indexing objects: 100% (196/196), done.

	      $ bup join -r myserver: mybackup-tar | tar -tf - | wc -l
	      1961

SEE ALSO
       bup-join(1), bup-index(1), bup-save(1), bup-on(1), ssh_config(5)

BUP
       Part of the bup(1) suite.

AUTHORS
       Avery Pennarun <apenwarr@gmail.com>.

Bup unknown-															      bup-split(1)
bup-split(1) debian man page | unix.com