I have a very large file (10,000,000 lines), that contains a sample id and a property of that sample. I have another file that contains around 1,000,000 lines with sample ids that I want to remove from the original file (create a new file without these lines).
I know how to do this in Perl, but it... (9 Replies)
Dear all,
I need to search multiple patterns and then I need to print their respective next lines. For an example, in the below table, I will look for 3 different patterns :
1) # ATC_Codes:
2) # Generic_Name:
3) # Drug_Target_1_Gene_Name:
#BEGIN_DRUGCARD DB00001
# AHFS_Codes:... (3 Replies)
I have two files. The first containing a header and six columns of data.
Example file 1:
Number SNP ID dbSNP RS ID Chromosome Result_Call Physical Position
787066 SNP_A-8575395 RS6650104 1 NOCALL 564477
786872 SNP_A-8575125 RS10458597 1 AA ... (13 Replies)
I need to print out sections (varying numbers of lines) of a file between patterns. That alone is easy enough: sed -n '/START/,/STOP/' I also need the 3 lines BEFORE the start pattern. That alone is easy enough: grep -B3 START But I can't seem to combine the two so that I get everything between the... (2 Replies)
Hi,
I want to print only lines (green-italic lines) in between first and last strings in column 9.
there are different number of lines between each strings.
10 AUGUSTUS exon 4558 4669 . - . 10.g1
10 AUGUSTUS exon 8771 8889 . ... (6 Replies)
Hi Gurus,
I have a requirement where I need to display all lines between 2 patterns except the line where the first pattern in it. I tried the following command using awk but it is printing all lines except the lines where the 2 patterns exist.
awk '/TRANSF_/{ P=1; next } /Busy/ {exit} P'... (9 Replies)
Hi
I need to egrep patterns in a file and limit number of matches to print for each matched pattern.
-m10 option is not working out in my sun solaris 5.10
Please guide me the options to achieve.
if i do head -10 , i wont be getting all pattern match results as output since for a... (10 Replies)
In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Hi, I need to print lines which are matching with start pattern "SELECT" and END PATTERN ";" and only select the last "select" statement including the ";" .
I have attached sample input file and the desired input should be as:
INPUT FORMAT:
SELECT
ABCD,
DEFGH,
DFGHJ,
JKLMN,
AXCVB,... (5 Replies)
Discussion started by: nani2019
5 Replies
LEARN ABOUT DEBIAN
bup-margin
bup-margin(1) General Commands Manual bup-margin(1)NAME
bup-margin - figure out your deduplication safety margin
SYNOPSIS
bup margin [options...]
DESCRIPTION
bup margin iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two
entries. This number, n, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.
For example, one system that was tested had a collection of 11 million objects (70 GB), and bup margin returned 45. That means a 46-bit
hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by
its first 46 bits.
The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits,
that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits
with far fewer objects.
If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running bup margin occasionally to see if
you're getting dangerously close to 160 bits.
OPTIONS --predict
Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer
from the guess. This is potentially useful for tuning an interpolation search algorithm.
--ignore-midx
don't use .midx files, use only .idx files. This is only really useful when used with --predict.
EXAMPLE
$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible
Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.
$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%)
SEE ALSO bup-midx(1), bup-save(1)BUP
Part of the bup(1) suite.
AUTHORS
Avery Pennarun <apenwarr@gmail.com>.
Bup unknown-bup-margin(1)