Hi everybody,
I am hoping somebody here will be either be able to solve my troubles or at least give me a push in the right direction :) .
I am developing a shell script to read in 4 different files worth of data that each contain a list of:
username firstname secondname group score
I... (2 Replies)
Hi guyz
Excuse me for posting simple question
I tried join and sort and other perl commands but failed
I have 2 files. 1st file contain single column with around 6000 values (rows).
Second file contain 2 columns 1st column is the same column (in 1st file) but randomly ordered and second... (5 Replies)
Hello,
My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns:
File A: (tab-delimited)
PDB CHAIN Start End Fragment
1avq A 171 176 awyfan
1avq A 172 177 wyfany
1c7k A 2 7... (3 Replies)
Been tearing my hair out trying to work out how to make the -t option in the join command work.
Joining two files on col 1; columns in both files are separated by tabs.
file1:
2010/02/01-00:00 10.63
2010/02/01-00:06 10.63
2010/02/01-00:12 10.61
2010/02/01-00:18 10.58
(there are LOTS... (4 Replies)
1. The problem statement, all variables and given/known data:
I have two files created from extracting data off of two CSV files, one containing class enrollment on a specific quarter and the other containing grades for that specific quarter. The Enrollment file generated contains course name,... (11 Replies)
Hello,
Going through book, "Guide to UNIX Using Linux". I am doing one of the projects that has me writing scripts to join files. Here is my pnumname script and I am extracting the programmers names and numbers from the program file and redirecting the output to the file pnn. I then created a... (0 Replies)
Dear all,
I have two files (each only contains 1 column) as attached. I want to combined the two files and only show the common records in both files. But when I use join command only the last row was combined. Anyone know what is the problem? I don't know how to write the correct code to only... (2 Replies)
I have a weird issue going on with the join command...
I have two files I am trying to join...here is a line from each file with the important parts marked in red:
file1:
/groupspace/ccops/cmis/bauwkrcn/commsamp_20140315.txt,1
file2:... (3 Replies)
I have 2 files. File 1 is a daily file with only a bunch of IDs and a date column. File 2 has all the dump of IDs and their respective cost. I basically want an inner join. When I am picking a few rows from these files and joining, they work perfectly fine. But when I join the full files together,... (13 Replies)
Discussion started by: Varshha
13 Replies
LEARN ABOUT DEBIAN
bup-margin
bup-margin(1) General Commands Manual bup-margin(1)NAME
bup-margin - figure out your deduplication safety margin
SYNOPSIS
bup margin [options...]
DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two
entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.
For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit
hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by
its first 46 bits.
The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits,
that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits
with far fewer objects.
If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if
you're getting dangerously close to 160 bits.
OPTIONS --predict
Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer
from the guess. This is potentially useful for tuning an interpolation search algorithm.
--ignore-midx
don't use .midx files, use only .idx files. This is only really useful when used with --predict.
EXAMPLE
$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible
Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.
$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%)
SEE ALSO bup-midx(1), bup-save(1)BUP
Part of the bup(1) suite.
AUTHORS
Avery Pennarun <apenwarr@gmail.com>.
Bup unknown-bup-margin(1)